Study: Most AI Chatbots Can Be Easily Jailbroken

Most chatbots can be easily tricked into providing dangerous information, according to a new report from arXiv. The study found that so-called “dark LLMs” – AI models that have either been designed without safety guardrails, or models that have been “jailbroken” – are on the rise.

When large language models (LLMs) are trained, they are fed massive volumes of information from the internet, which includes content that could be considered dangerous. Traditional chatbots have built-in safety controls that prevent the programs from sharing this information when prompted by user questions. However, the researchers identified a growing trend of people circumventing these controls – and designing chatbots without them entirely.

With a growing number of companies replacing their employees with AI, these findings should serve as a note of caution on the perils of automation.

Most Chatbots Are Vulnerable to Exploitation, New Study Reveals

Most chatbots can be easily jailbroken and tricked into providing dangerous information to users, according to a new study from researchers at Ben Gurion University of the Negev. Professor Lior Rokach and Dr Michael Fire published the findings in arXiv, which also observe a worrying rise in AI models that are designed without standard safety guardrails.

When LLMs are trained, they are fed vast amounts of information from the internet. This includes information that could be considered dangerous, such as instructions on how to make a bomb, commit insider trading, and more. To stop the models from sharing this with users, they are designed with built-in safety controls.

This just in! View
the top business tech deals for 2025 👨‍💻

However, the researchers identified a concerning rise in cases of people overriding these safety controls, with some even advertising new chatbots with “no ethical guardrails” online. They warned “what was once restricted to state actors or organized crime groups may soon be in the hands of anyone with a laptop or even a mobile phone.” Inevitably, this will give rise to an increasing number of AI-related controversies.

Researchers Identify Growing Trend of Chatbot Manipulation

Commonly, jailbreaking relies upon meticulous prompting to trick chatbots into providing responses that override their programming. All AI models have a primary and secondary goal – to follow the user’s instructions and avoid sharing information that is deemed to be harmful, biased, unethical, or illegal. Jailbreaking works by getting in between those two goals.

During their research, Rokach and Fire discovered a “universal jailbreak attack” that is able to exploit multiple leading AI chatbots. This allowed them to generate responses that would normally be refused, including how to hack computer networks or make drugs. Fire remarked: “It was shocking to see what this system of knowledge consists of.”

The researchers took their findings to several leading chatbot providers, but claimed that their responses were “often inadequate.” Alarmingly, many of the LLMs in question were still vulnerable to the attack seven months on from its discovery, with the original findings published online in late 2024.

Findings Should Serve as Warning to Businesses

Ultimately, the research reveals some disconcerting truths. Firstly, AI chatbots are vulnerable to exploitation, and therefore pose a tangible risk to users and society at large. With model training becoming more accessible, and open-source LLMs proliferating, this problem will only get worse.

Perhaps more concerningly, LLM vendors are largely failing in their duties to safeguard users from dangerous information. Launched in December 2024, OpenAI’s o1 model can reason about the company’s safety policies, which hypothetically makes it less vulnerable to exploitation. But other companies are simply not doing enough.

As more and more businesses cut staff and invest hundreds of thousands of dollars into AI, these findings should serve as a stark warning – at present, AI models are not always the silver bullet that many people seem to think.

Study: Most AI Chatbots Easily Tricked Into Providing “Dangerous” Responses

Most Chatbots Are Vulnerable to Exploitation, New Study Reveals

Researchers Identify Growing Trend of Chatbot Manipulation

Findings Should Serve as Warning to Businesses

Written by:

Anthropic Acquires Humanloop, Bids to Strengthen Enterprise Strategy

Perplexity AI’s $34.5B Bid to Buy Chrome Just Might Be a Stunt

AI Agents Like ChatGPT Are Vulnerable to Hacking, Security Firm Finds

OpenAI Apologizes for ‘Mega Chart Screwup’ from GPT-5 Launch