New RSL Standard Aims to Stop Unpaid AI Content Scraping

With RSL, sites can embed licensing requirements directly on their website, making AI companies pay for the data.

Written by

Published on September 10, 2025

Reddit, Yahoo, Medium, wikiHow, and many more content-publishing websites have banded together to keep AI companies from scraping their content without compensation.

They’re creating “Really Simple Licensing” (RSL), an open licensing standard that allows online publishers to set up machine-readable licensing terms for their content.

Huge AI companies have been getting in legal trouble for allegedly stealing massive amounts of data off the internet for years now. This new infrastructure hopes to give them a simple way to pay the people who are providing the future training data that AI companies everywhere will need to stay competitive.

What Is Really Simple Licensing?

The RSL Standard is an upgrade on the robots.txt protocol, which is what allows websites to determine which areas will or won’t accept the content-scraping bots that AI companies use to collect data, which they then feed to their AI models to create generative AI chatbots like ChatGPT or Gemini.

With RSL, sites can embed licensing and royalty requirements directly in robots.txt files. They can do the same with books, videos, or other training datasets, too.

This just in! View
the top business tech deals for 2026 👨‍💻

With the RSL Standard, online publishers can ask AI companies to pay an ongoing subscription or a one-time pay-per-crawl fee.

Currently, a few specific publications (like The Wall Street Journal’s owner or The New York Times) have set up licensing agreements with certain AI companies, as The Verge notes in its coverage. This new standard could let many other publishers streamline and speed up that type of process.

Content Publishers Hoping to Fight Back

The list of publishers who have thrown their weight behind this new standard are a murder’s row of the websites that stand to lose the most from the birth of AI-generated content.

Reddit, Yahoo, Medium, Quora, People Inc.: They’re all websites that many people turn to for answers to everything from niche hobbies to science homework to do-it-yourself home repairs.

Today’s chatbots need up-to-date versions of that type of data in order to deliver accurate responses — even if this sometimes leads to hallucinations too, as in the case of a Google AI overview that turned a sarcastic reddit post into a serious recommendation to put glue on a pizza.

Will This Stop AI Companies From Getting Sued Over and Over?

Back in 2023, OpenAI dealt with one $3 billion class action lawsuit related to the “secret scraping” of data used to train ChatGPT models, in addition to a separate suit against both OpenAI and Meta, from authors accusing them of copyright infringement.

Today, similar cases are still making headlines. Anthropic just settled a class action lawsuit over hundreds of thousands of pirated books that were used to train AI chatbots. The $1.5 billion payout may still be subject to changes.

However, it’s easy to see why AI companies and online publishers are both interested in coming to a compromise.

If everyone wants to play ball, this new standard may be the best chance yet that AI-skeptical writers have of keeping their content from being scooped up by AI.