Researchers at Google have developed a new AI tool called MusicLM. The tool can convert text descriptions into high-quality music. The tool generates music at 24khz which is consistent for many minutes.

Photo by Benjamin Dada on Unsplash

On Thursday, Google in a research paper claimed that the tool significantly outperformers other systems in audio quality and adherence to the text brief. They also added that the tool can generate a whistled or hummed melody based on the style described in the text.

MusicLM, is similar to OpenAI's image generator tool called DALL-E. The research paper was penned by 13 researchers and included a plethora of samples or five-minute clips. The clips had melodic techno, swing, jazz and more.

At present, Google has not made any plans to release the model as it acknowledged that the tool has several risks associated with it. In the paper, Google said, "We strongly emphasize the need for more future work in tackling these risks associated with music generation — we have no plans to release models at this point."

The current generated samples will have biases present. As the training data fed to the tool may raise questions about the music genres and cultures underrepresented. Google also noted that the tool would also raise concerns about cultural appropriation.

In the future, the company mostly may focus on lyrics generation. They will also work on improving the text conditioning for the tool to generate music more aptly. There are also areas like the vocal quality that the AI tool can be worked on, such as, modelling the high-level structure. This would help the tool generate music with an introduction, verse and chorus. Google would also be feeding higher-quality music data to the tool.