Meta has confirmed that an AI-powered music generator that it started demoing last week was trained using 20,000 hours of licensed music, including 10,000 “high quality” music tracks. Whatever that means.
Felix Kreuk from the Facebook owner’s AI research team last week posted on Twitter about the music-making tool, which is called MusicGen. It is, he wrote, “a simple and controllable music generation model” which can be “prompted by both text and melody”.
A more detailed description on GitHub, meanwhile, says “MusicGen is a single-stage auto-regressive Transformer model trained over a 32kHz EnCodec tokeniser with four codebooks sampled at 50Hz”.
And unlike, say, Google’s text-to-music generator MusicLM, “MusicGen doesn’t require a self-supervised semantic representation, and it generates all four codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict them in parallel, thus having only 50 auto-regressive steps per second of audio”.
So that’s all fun times, isn’t it? Of course, for many in the music industry, alongside discussions about the capabilities and technicalities of these ever-evolving music-making AI tools, there is a big debate about what music is being used to train the tech. And where commercially released music is being employed – either to train an AI tool in general or as a reference track by someone using the tool – how is that being licensed?
In a white paper to accompany MusicGen, Meta writes: “We use 20,000 hours of licensed music to train MusicGen. Specifically, we rely on an internal dataset of 10,000 high quality music tracks, and on the ShutterStock and Pond5 music data collections with respectively 25,000 and 365,000 instrument-only music tracks. All datasets consist of full-length music sampled at 32kHz with metadata composed of a textual description and additional information such as the genre, BPM, and tags”.
The music industry and other copyright industries are busy lobbying lawmakers in a bid to counter any suggestions that it is – or should be – possible to train generative AI tools with copyright protected works without getting licences from the copyright owners. Meanwhile, both copyright owners and creators are calling for more transparency as to how existing content is being used by generative AI tools.
By using only licensed music and providing some information about what music was used, Meta is ticking some of those boxes already. Although, the music industry – and especially music-makers – probably need a bit more information than just “an internal dataset of 10,000 high quality music tracks”, which poses more questions than it answers. And the option to upload an existing reference track alongside text prompts in order to use the tool is likely to pose a bunch more questions.
In the meantime, there is a demo version of MusicGen accessible below.