OpenAI: We Need Copyrighted Works for Free to Train AI

Can AI models afford to pay for the resources they need in order to exist? OpenAI appears to argue that it cannot.

OpenAI, the company behind the famed ChatGPT AI model, has claimed in a recent legal filing that it needs copyrighted materials in order to continue training its AI model.

The company must keep releasing improved models in order to sustain itself, with a long-promised ChatGPT-5 on the way later this year or sometime in 2025.

In this new filing, OpenAI appears to be suggesting that it should be allowed to use copyrighted materials for free, since the alternative is its business collapsing.

How OpenAI’s Argument Works

OpenAI’s filing was submitted to the British Parliament’s House of Lords’ communications and digital committee and argues that it would be “impossible” to create a valuable market-leading AI model on public domain content alone.

As the evidence filing puts it:

 

About Tech.co Video Thumbnail Showing Lead Writer Conor Cawley Smiling Next to Tech.co LogoThis just in! View
the top business tech deals for 2024 👨‍💻
See the list button

“Because copyright today covers virtually every sort of human expression — including blog posts, photographs, forum posts, scraps of software code, and government documents — it would be impossible to train today’s leading AI models without using copyrighted materials. Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment but would not provide AI systems that meet the needs of today’s citizens.”

OpenAI is already facing lawsuits related to the unauthorized use of copyrighted materials: The New York Times alleges “massive copyright infringement” for the use of its content for training, while the Authors Guild has also sued over the use of famous authors’ works in AI training.

Does the Argument Hold Up in the Court of Public Opinion?

Personally, I’m reminded of a similar argument Facebook made years ago, when complaints of its poor content moderation emerged. Facebook’s response was that it was too large a platform to moderate properly, with the implication seeming to be that it felt this justified allowing it to suffer no consequences. The implication to me at the time was that it meant Facebook was too big to continue existing.

In the case of this OpenAI situation, the company appears to be arguing that it can’t afford the copyrighted material that it needs to create AI. If you agree with this, I don’t know why you would then decide that OpenAI should be given the copyrighted material for free to train its model. The more reasonable next step, in my view, would be for the company to change its approach, or perhaps disband itself.

Some critics seem to agree, with one particularly insightful X/Twitter comment comparing the situation to a hypothetical in which a drug dealers argue a similar case:

LLM Training Troubles Will Likely Continue

The tech industry may have been distancing itself from the “move fast and break things” ethos that once defined it, but OpenAI’s legal troubles seem to indicate that many top tech companies still struggle with the concept today.

AI companies in need of training materials may find themselves facing further problems in the near future, according to the results of a new study: More than 57% of today’s internet content may be AI-generated already. This could result in a snake-eating-its-tail situation as large language models (LLMs) train themselves on content that was itself produced by a previous LLM.

AI has yet to deliver on many promises that it can revolutionize the world. Proving that AI can afford to pay for the resources it needs in order to exist would be a great step towards doing just that.

Did you find this article helpful? Click on one of the following buttons
We're so happy you liked! Get more delivered to your inbox just like it.

We're sorry this article didn't help you today – we welcome feedback, so if there's any way you feel we could improve our content, please email us at contact@tech.co

Written by:
Adam is a writer at Tech.co and has worked as a tech writer, blogger and copy editor for more than a decade. He was a Forbes Contributor on the publishing industry, for which he was named a Digital Book World 2018 award finalist. His work has appeared in publications including Popular Mechanics and IDG Connect, and his art history book on 1970s sci-fi, 'Worlds Beyond Time,' is out from Abrams Books in July 2023. In the meantime, he's hunting down the latest news on VPNs, POS systems, and the future of tech.
Explore More See all news
Back to top
close Building a Website? We've tested and rated Wix as the best website builder you can choose – try it yourself for free Try Wix today