Study: Most AI Chatbots Easily Tricked Into Providing “Dangerous” Responses

Most AI chatbots can be easily jailbroken and prompted to generate dangerous information, according to new research.

Most chatbots can be easily tricked into providing dangerous information, according to a new report from arXiv. The study found that so-called “dark LLMs” – AI models that have either been designed without safety guardrails, or models that have been “jailbroken” – are on the rise.

When large language models (LLMs) are trained, they are fed massive volumes of information from the internet, which includes content that could be considered dangerous. Traditional chatbots have built-in safety controls that prevent the programs from sharing this information when prompted by user questions. However, the researchers identified a growing trend of people circumventing these controls – and designing chatbots without them entirely.

With a growing number of companies replacing their employees with AI, these findings should serve as a note of caution on the perils of automation.

Most Chatbots Are Vulnerable to Exploitation, New Study Reveals

Most chatbots can be easily jailbroken and tricked into providing dangerous information to users, according to a new study from researchers at Ben Gurion University of the Negev. Professor Lior Rokach and Dr Michael Fire published the findings in arXiv, which also observe a worrying rise in AI models that are designed without standard safety guardrails.

When LLMs are trained, they are fed vast amounts of information from the internet. This includes information that could be considered dangerous, such as instructions on how to make a bomb, commit insider trading, and more. To stop the models from sharing this with users, they are designed with built-in safety controls.

 

About Tech.co Video Thumbnail Showing Lead Writer Conor Cawley Smiling Next to Tech.co LogoThis just in! View
the top business tech deals for 2025 👨‍💻
See the list button

However, the researchers identified a concerning rise in cases of people overriding these safety controls, with some even advertising new chatbots with “no ethical guardrails” online. They warned “what was once restricted to state actors or organized crime groups may soon be in the hands of anyone with a laptop or even a mobile phone.” Inevitably, this will give rise to an increasing number of AI-related controversies.

Researchers Identify Growing Trend of Chatbot Manipulation

Commonly, jailbreaking relies upon meticulous prompting to trick chatbots into providing responses that override their programming. All AI models have a primary and secondary goal – to follow the user’s instructions and avoid sharing information that is deemed to be harmful, biased, unethical, or illegal. Jailbreaking works by getting in between those two goals.

During their research, Rokach and Fire discovered a “universal jailbreak attack” that is able to exploit multiple leading AI chatbots. This allowed them to generate responses that would normally be refused, including how to hack computer networks or make drugs. Fire remarked: “It was shocking to see what this system of knowledge consists of.”

The researchers took their findings to several leading chatbot providers, but claimed that their responses were “often inadequate.” Alarmingly, many of the LLMs in question were still vulnerable to the attack seven months on from its discovery, with the original findings published online in late 2024.

Findings Should Serve as Warning to Businesses

Ultimately, the research reveals some disconcerting truths. Firstly, AI chatbots are vulnerable to exploitation, and therefore pose a tangible risk to users and society at large. With model training becoming more accessible, and open-source LLMs proliferating, this problem will only get worse.

Perhaps more concerningly, LLM vendors are largely failing in their duties to safeguard users from dangerous information. Launched in December 2024, OpenAI’s o1 model can reason about the company’s safety policies, which hypothetically makes it less vulnerable to exploitation. But other companies are simply not doing enough.

As more and more businesses cut staff and invest hundreds of thousands of dollars into AI, these findings should serve as a stark warning – at present, AI models are not always the silver bullet that many people seem to think.

Did you find this article helpful? Click on one of the following buttons
We're so happy you liked! Get more delivered to your inbox just like it.

We're sorry this article didn't help you today – we welcome feedback, so if there's any way you feel we could improve our content, please email us at contact@tech.co

Written by:
Gus is a Senior Writer at Tech.co. Since completing his studies, he has pursued a career in fintech and technology writing which has involved writing reports on subjects including web3 and inclusive design. His work has featured extensively on 11:FS, The Fold Creative, and Morocco Bound Review. Outside of Tech.co, he has an avid interest in US politics and culture.
Explore More See all news
Back to top
close Building a Website? We've tested and rated Wix as the best website builder you can choose – try it yourself for free Try Wix today