Amazon and MailChimp Login Details Found in AI Training Data

More than 12,000 'data secrets' including logins and APIs keys found hardcoded in AI training dataset.

Thousands of details including login credentials for Amazon Web Services (AWS), MailChimp, and WalkScore.have been found in an AI training dataset, used by the likes of DeepSeek.

In September, a report from Deloitte suggested that almost three out of every four professionals the company surveyed put data privacy among their top three concerns surrounding the rapid rollout of generative AI tools.

And of equal concern is where the data being used to train the AI models is coming from, with news of leaks like this set to justify fears.

OpenSource Dataset Trawled

The secret data was found by cybersecurity researchers, who trawled though 400 terabytes of information.

This was collected from 2.67 billion web pages archived in 2024 and held by The Common Crawl. This non-profit has created an open-source repository of web data, which it started collecting in 2008. It is free for anyone to use so is popular with LLM developers. It hosts around 250 petabytes of web data but this is constantly added to.

 

About Tech.co Video Thumbnail Showing Lead Writer Conor Cawley Smiling Next to Tech.co LogoThis just in! View
the top business tech deals for 2025 👨‍💻
See the list button

It was a team at Truffle Security that analyzed this data and found almost 12,000 “valid secrets”, including API keys and passwords, hardcoded in the archive.

Login Details Found in AI Dataset

The Truffle team said that they were prompted to carry out the analysis after wondering why Large Language Models (LLMs) were instructing developers to hardcode API keys.

In the blog announcing its discoveries, the team says that it detected 11,908 “Live Secrets” and found 2.76 million web pages contained live secrets.

Even more worryingly, it also found a high reuse rate of the secret details. It states that 63% were repeated across multiple web pages. “In one extreme case, a single WalkScore API key appeared 57,029 times across 1,871 subdomains,” the researchers write.

Impact of Login Details Being Exposed by AI

Needless to say, any login data that is discovered is bad news for the original account holders, and leaves them highly vulnerable. Because these datasets are being used by some of the world’s biggest AI pioneers, their AI tools could then be weaponized by cybercriminals to uncover login credentials when they want to launch an attack.

Truffle Security is reported to now be working with the vendors to help fix the issue. It has also issued some advice for the AI industry as a whole. It writes: “LLMs may benefit from improved alignment and additional safeguards – potentially through techniques like Constitutional AI – to reduce the risk of inadvertently reproducing or exposing sensitive information.”

This technique has been developed by Anthropic – one of the few AI players to consistently push for an AI safety framework to be put in place. Others – including OpenAI – seem keen to push ahead with innovation at all costs – with the full support of the Trump administration.

Did you find this article helpful? Click on one of the following buttons
We're so happy you liked! Get more delivered to your inbox just like it.

We're sorry this article didn't help you today – we welcome feedback, so if there's any way you feel we could improve our content, please email us at contact@tech.co

Written by:
Katie has been a journalist for more than twenty years. At 18 years old, she started her career at the world's oldest photography magazine before joining the launch team at Wired magazine as News Editor. After a spell in Hong Kong writing for Cathay Pacific's inflight magazine about the Asian startup scene, she is now back in the UK. Writing from Sussex, she covers everything from nature restoration to data science for a beautiful array of magazines and websites.
Explore More See all news
Back to top
close Building a Website? We've tested and rated Wix as the best website builder you can choose – try it yourself for free Try Wix today