OpenAI and Meta are being sued for copyright infringement in US District Court by a trio of authors including the comedian Sarah Silverman.
At the heart of the lawsuit are allegations that OpenAI's GPT-3.5 and GPT-4 and Meta's LLaMA Large Language Models (LLM) were trained using copyrighted works illegally scraped from book torrenting websites.
As well as Silverman's works, including her 2010 memoir Bedwetter, books by Chris Golden (Ararat) and Richard Kadrey (Sandman Slim) are also cited as examples of the LLMs summarizing copyrighted works without attribution.
Sarah Silverman vs OpenAI and Meta Explained
While closely linked, the lawsuits being levelled by the authors against OpenAI – which owns ChatGPT – and Meta are actually two separate filings.
The main exhibits cited against OpenAI are ChatGPT responses to prompts for summaries of the works of Silverman and her co-plaintiffs. Here, the primary issue cited is the fact that these digests are produced without “any of the copyright management information Plaintiffs included with their published works.”
In the complaint against Meta, the authors claim that the datasets used by the Facebook, Instagram and Threads owner to train its LLaMA model include illegally acquired materials gleaned from “shadow library” websites such as Bibliotik, a torrenting website for books that works similar to how the infamous Pirate Bay network does for movie and TV show downloads.
Multiple Literary Lawsuits Leveled at OpenAI
The dual-actions featuring Sarah Silverman and her works are just the latest literary lawsuits to be leveled at OpenAI in quick succession.
Having teamed up in November 2012 under the LLM Litigation banner, California area lawyers Joseph Saveri and Matthew Butterick are spearheading the legal actions on behalf of both Silverman's trio and two more authors, Paul Tremblay and Mona Awad. Filed at the end of June, Tremblay and Awad's lawsuit similarly accuses of the OpenAI language models underlying ChatGPT of “ingesting” their copyrighted works when collecting data across the internet to help train the chatbot.
In addition to these claims, Saveri and Butterick are also overseeing lawsuits on behalf of developers (vs. AI coding platform GitHub Copilot, in which Microsoft is named a co-defendant) and artists (vs. AI image generator Stable Diffusion).
Separate to these, OpenAI is facing a further $3 billion class action lawsuit from a group of anonymous individuals who claim that the “secret scraping” of data conducted to train ChatGPT models amount to nothing less than “data theft.”
Overall, initial enthusiasm for AI technology and its flag-bearer seems to be waning for the first time. ChatGPT use has declined for the first time since its public launch, while real questions are starting to emerge over how ChatGPT saves your data.
These are just some of the many ethical issues starting to emerge around artificial intelligence and companies like OpenAI, with the lawsuits starting to flood in from the creative industries potentially just the the tip of the iceberg. Who will have the last laugh? If not the robots themselves, then it really does seem to be anyone's guess.