Comedian and writer Sarah Silverman has joined with fellow authors Christopher Golden and Richard Kadrey to file copyright lawsuits against OpenAI and Meta. The trio of writers accuse the companies of copyright infringement for allegedly training their ChatGPT and LLaMa generative AI models with copyright-protected books they have written – without obtaining a licence to use the content of the books for that purpose.
Generative AI models – which generate content of one kind or another – need to be “trained”, a process which essentially means showing the AI the type of output it is expected to generate.
This requires providing the model with significant amounts of existing content, which raises a number of questions around what content has been used, and whether that use needs a licence from whoever owns the copyright in the original material.
With generative AI becoming rapidly more sophisticated, and with more companies entering the field, answering those questions has become more urgent.
Copyright owners from the media and entertainment industries – including those that control the rights in books, photography and music – are adamant that any training of an AI model with existing content exploits the copyright in that content, and therefore the makers of the AI tools must license any content they use for training from the relevant copyright owners.
However, not everyone on the tech side agrees. Some in the AI space argue that their use of copyright-protected material is covered by exceptions that exist in the copyright systems in the countries where their servers are based. Or that training of generative AI may be permitted under the somewhat tricky concept of fair use under US law. And therefore they do not need to get permission from the owners of any copyrights.
The copyright industries, including the music industry, have been pushing governments and lawmakers to clear up any ambiguity in this domain and ensure that the makers of generative AI are required to secure licences. Meanwhile, the lawsuits are starting to stack up testing what copyright law currently says in this domain.
Two other authors – Paul Tremblay and Mona Awad – filed a copyright infringement lawsuit against ChatGPT last month. Meanwhile, Getty Images has sued the people behind the visual generative AI platform Stability AI in both the UK and US courts.
None of these are music-specific cases but nevertheless could set precedents of huge importance to the music industry.
The new lawsuit claims “much of the material” in the training datasets used by OpenAI and Meta to respectively train ChatGPT and LLaMA “comes from copyrighted works – including books written by plaintiffs – that were copied … without consent, without credit, and without compensation”.
“Many kinds of material have been used to train large language models”, the lawsuit adds. “Books, however, have always been a key ingredient in training datasets for large language models because books offer the best examples of high-quality longform writing”.
One challenge for copyright owners seeking to enforce their rights against the makers of generative AI tools is that it’s not always clear what specific material has been used to train the AI.
Another demand from copyright owners, therefore, is that a transparency obligation should be put into law so that AI companies are obliged to clearly state what data and content they used to train their generative AI models.
The companies behind some of the generative AI models for music creation have actually said in accompanying documents that those models were trained on “licensed music”, although it sometimes isn’t clear what specific music has been used.
In the meantime, the two new lawsuits against OpenAI and Meta dissect what the two companies have said about how they trained their technologies in order to reach some assumptions regarding what copyright-protected materials may have been exploited.
The lawsuits accuse the technology companies of direct copyright infringement and vicarious copyright infringement, as well as unfair completion and unjust enrichment. They also accuse the tech firms of removing copyright management information linked to the copyright-protected works they exploited, in violation of US copyright rules.
We wait to see how OpenAI and Meta respond – and also what the next copyright infringement lawsuit against an AI platform might be.