
Google unveils MusicLM, a brand-new AI tool that creates excellent music from text descriptions. It can understand phrases like “a calming violin melody backed by a distorted guitar riff” and convert them into musical compositions.
This is a significant advancement in AI-generated music that could have a significant impact on how music is created and consumed. Google Meet hardware users can now participate in Zoom meetings, according to a recent announcement.
MusicLM: Generating Music From Text
The tool is user-friendly and simple to use, making it accessible to a wide range of users. It generates music at 24 kHz using a hierarchical sequence-to-sequence modelling approach that stays consistent over several minutes.

Studies reveal that MusicLM performs better than earlier systems in terms of audio quality and adherence to the text description. It can also take both text and an existing melody as input, allowing it to transform whistled and hummed melodies into the style described in a text caption. MusicCaps, a dataset of 5.5k music-text pairs with rich text descriptions provided by human experts, has been publicly released by the developers to support future research.
It has a number of important features, such as:
Audio Generation From Rich Captions: MusicLM can create complex and nuanced compositions from simple text descriptions.
Long Generation: MusicLM can generate music that is consistent over a period of several minutes.
Story Mode: It allows users to create music that tells a story; it can be used for music scoring for movies, television shows, and other forms of media.
Text and Melody Conditioning: MusicLM can accept both text and an existing melody as input, transforming whistled and hummed melodies into the style described in a text caption.
Painting Caption Conditioning: It produces music in accordance with the mood and aesthetic portrayed in a painting caption. 5.5k music-text pairs with rich text descriptions provided by human experts have been publicly released by the developers to support future research.
In addition, MusicLM can identify different musicians’ experience levels, locations, historical eras, accordion solos, and generational diversity while maintaining the conditioning and/or semantic tokens. It can also recognise the same text prompt and semantic tokens, among many other things.