Andrea Agostinelli, Timo I. Denk, Zalán Borsos, Jesse Engel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, Aren Jansen, Adam Roberts, Marco Tagliasacchi, Matt Sharifi, Neil Zeghidour, Christian Frank
Google Research
Abstract We introduce MusicLM, a model generating high-fidelity music from text descriptions such as "a calming violin melody backed by a distorted guitar riff". MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modeling task, and it generates music at 24 kHz that remains consistent over several minutes. Our experiments show that MusicLM outperforms previous systems both in audio quality and adherence to the text description. Moreover, we demonstrate that MusicLM can be conditioned on both text and a melody in that it can transform whistled and hummed melodies according to the style described in a text caption. To support future research, we publicly release MusicCaps, a dataset composed of 5.5k music-text pairs, with rich text descriptions provided by human experts.
Caption | Generated audio |
---|---|
The main soundtrack of an arcade game. It is fast-paced and upbeat, with a catchy electric guitar riff. The music is repetitive and easy to remember, but with unexpected sounds, like cymbal crashes or drum rolls. | |
A fusion of reggaeton and electronic dance music, with a spacey, otherworldly sound. Induces the experience of being lost in space, and the music would be designed to evoke a sense of wonder and awe, while being danceable. | |
A rising synth is playing an arpeggio with a lot of reverb. It is backed by pads, sub bass line and soft drums. This song is full of synth sounds creating a soothing and adventurous atmosphere. It may be playing at a festival during two songs for a buildup. | |
Slow tempo, bass-and-drums-led reggae song. Sustained electric guitar. High-pitched bongos with ringing tones. Vocals are relaxed with a laid-back feel, very expressive. |
Text prompt | Generated audio |
---|---|
melodic techno | |
swing | |
relaxing jazz |
Text prompts | Generated audio |
---|---|
time to meditate time to wake up time to run time to give 100% | |
electronic song played in a videogame meditation song played next to a river fire fireworks | |
jazz song pop song rock song death metal song rap song string quartet with violins epic movie soundtrack with drums scottish folk song with traditional instruments |
melody prompt → text prompt ↓ | bella ciao - humming | bella ciao - jingle bells - whistling | mozart symphony25 - whistling | ode to joy - humming | fingerstyle guitar | jingle bells - marimba | twinkle twinkle little star - piano | when the saints go marching in - strings |
a cappella chorus | ||||||||
electronic synth lead | ||||||||
guitar solo | ||||||||
jazz with saxophone | ||||||||
opera singer | ||||||||
piano solo | ||||||||
string quartet | ||||||||
tribal drums and flute |
Painting title and author | Painting image (from Wikipedia) | Painting description | Generated audio |
---|---|---|---|
The Persistence of Memory- Salvador Dalí | ![]() | "His melting-clock imagery mocks the rigidity of chronometric time. The watches themselves look like soft cheese—indeed, by Dali s own account they were inspired by hallucinations after eating Camembert cheese. In the center of the picture, under one of the watches, is a distorted human face in profile. The ants on the plate represent decay." By Gromley, Jessica. "The Persistence of Memory". Encyclopedia Britannica, 14 Apr. 2022. | |
Napoleon Crossing the Alps - Jacques-Louis David | ![]() | "The composition shows a strongly idealized view of the real crossing that Napoleon and his army made across the Alps through the Great St Bernard Pass in May 1800." By wikipedia | |
Dance - Henri Matisse | ![]() | "Made early in his career, Matisse s Dance, 1910, shows a group of red dancers caught in a collective moment of innocent freedom and joy, holding hands as they whirl around in space. Simple and direct, the painting speaks volumes about our deep-rooted, primal human desire for connection, movement, rhythm and music." By thecollector.com | |
The Scream - Edvard Munch | ![]() | "Inspired by a hallucinatory experience in which Munch felt and heard a scream throughout nature, it depicts a panic-stricken creature, simultaneously corpse like and reminiscent of a sperm or fetus, whose contours are echoed in the swirling lines of the blood-red sky." By Zaczek, Iain. "The Scream". Encyclopedia Britannica, 14 Apr. 2022. |