New RSL Standard Aims to Stop Unpaid AI Content Scraping

With RSL, sites can embed licensing requirements directly on their website, making AI companies pay for the data.

Reddit, Yahoo, Medium, wikiHow, and many more content-publishing websites have banded together to keep AI companies from scraping their content without compensation.

They’re creating “Really Simple Licensing” (RSL), an open licensing standard that allows online publishers to set up machine-readable licensing terms for their content.

Huge AI companies have been getting in legal trouble for allegedly stealing massive amounts of data off the internet for years now. This new infrastructure hopes to give them a simple way to pay the people who are providing the future training data that AI companies everywhere will need to stay competitive.

What Is Really Simple Licensing?

The RSL Standard is an upgrade on the robots.txt protocol, which is what allows websites to determine which areas will or won’t accept the content-scraping bots that AI companies use to collect data, which they then feed to their AI models to create generative AI chatbots like ChatGPT or Gemini.

With RSL, sites can embed licensing and royalty requirements directly in robots.txt files. They can do the same with books, videos, or other training datasets, too.

 

About Tech.co Video Thumbnail Showing Lead Writer Conor Cawley Smiling Next to Tech.co LogoThis just in! View
the top business tech deals for 2025 👨‍💻
See the list button

With the RSL Standard, online publishers can ask AI companies to pay an ongoing subscription or a one-time pay-per-crawl fee.

Currently, a few specific publications (like The Wall Street Journal’s owner or The New York Times) have set up licensing agreements with certain AI companies, as The Verge notes in its coverage. This new standard could let many other publishers streamline and speed up that type of process.

Content Publishers Hoping to Fight Back

The list of publishers who have thrown their weight behind this new standard are a murder’s row of the websites that stand to lose the most from the birth of AI-generated content.

Reddit, Yahoo, Medium, Quora, People Inc.: They’re all websites that many people turn to for answers to everything from niche hobbies to science homework to do-it-yourself home repairs.

Today’s chatbots need up-to-date versions of that type of data in order to deliver accurate responses — even if this sometimes leads to hallucinations too, as in the case of a Google AI overview that turned a sarcastic reddit post into a serious recommendation to put glue on a pizza.

Will This Stop AI Companies From Getting Sued Over and Over?

Back in 2023, OpenAI  dealt with one $3 billion class action lawsuit related to the “secret scraping” of data used to train ChatGPT models, in addition to a separate suit against both OpenAI and Meta, from authors accusing them of copyright infringement.

Today, similar cases are still making headlines. Anthropic just settled a class action lawsuit over hundreds of thousands of pirated books that were used to train AI chatbots. The $1.5 billion payout may still be subject to changes.

However, it’s easy to see why AI companies and online publishers are both interested in coming to a compromise.

If everyone wants to play ball, this new standard may be the best chance yet that AI-skeptical writers have of keeping their content from being scooped up by AI.

Did you find this article helpful? Click on one of the following buttons
We're so happy you liked! Get more delivered to your inbox just like it.

We're sorry this article didn't help you today – we welcome feedback, so if there's any way you feel we could improve our content, please email us at contact@tech.co

Written by:
Adam has been a writer at Tech.co for nine years, covering fleet management and logistics. He has also worked at the logistics newletter Inside Lane, and has worked as a tech writer, blogger and copy editor for more than a decade. He was a Forbes Contributor on the publishing industry, for which he was named a Digital Book World 2018 award finalist. His work has appeared in publications including Popular Mechanics and IDG Connect, and his art history book on 1970s sci-fi, 'Worlds Beyond Time,' was a 2024 Locus Awards finalist. When not working on his next art collection, he's tracking the latest news on VPNs, POS systems, and the future of tech.
Explore More See all news
Back to top
close Building a Website? We've tested and rated Wix as the best website builder you can choose – try it yourself for free Try Wix today