Report: “Virtually Unlimited” Ways to Dodge AI Safety Guardrails

It's not reassuring news for the many industries hoping to prop up labor shortages and skills gaps with AI.

ChatGPT and Bard aren’t as safe as you thought. According to AI researchers, there are “virtually unlimited” ways to evade the popular generative AI chatbot’s built-in safety features.

AI algorithms learn from the data they’re given, so they can recreate any harmful views or lies that human tend to pass around. As a result, the biggest AIs have moderation tools built in, telling them to avoid the worst topics.

But AI can be tricked, making these guardrails essentially useless to anyone who knows what to say to the AI. According to the latest research, even the biggest and best AI bots on the market can be flimflammed in myriad ways.

How to Trick an AI

According to a new research paper covered by Insider, the secret to jailbreaking an AI chatbot lies in “automated adversarial attacks,” which are “mainly” created by simply adding characters to the end of user queries.

Safety rules are triggered at first, but the AI will eventually give in and parrot the hate speech or misinformation that it has within its dataset. Both OpenAI’s ChatGPT and Microsoft’s Bing were among the AIs that researcher say they can get lies and hate speech out of.

Tech companies have already issued some patches for these types of tricks. For example, by simply telling the AI to answer as though it did not have any content moderation rules in place used to work. Companies have since added more rules, and you won’t be able to pull off that specific trick.

But with the “virtually unlimited” ways to evade the AI regulations that researchers have now confirmed, it’s clear that tech companies can’t manually plug all these leaks in a cost-effective way.

Can AI Evolve Past These Problems?

We already know that AI can lie and that it can generate outright plagiarism. Now, there’s evidence that the tool can be mass-manipulated with the right series of commands.

It’s not reassuring news for the numerous industries hoping to prop up labor shortages and skills gaps with a little one-size-fits-all AI investmentStill, there’s hope for techno-optimists. After all, the point of new technology is to continue improving it.

Protect Your Data with SurfShark VPN

Connect an unlimited number of devices for just $2.49 per month.

However, Alphabet’s response to the new research paper on how to best exploit Bard throws a small bit of cold water on this hope. Here’s the official response from a Google spokesperson who spoke to Insider:

“While this is an issue across LLMs, we’ve built important guardrails into Bard – like the ones posited by this research – that we’ll continue to improve over time.”

The reference to other LLMs – or Large Language Models, the bedrock technology behind all generative AI – indicates that this is an industry-wide problem. Now, the current small tweaks like guardrails have been proven not to work.

Overcoming this issue will require a foundational shift in AI technology. It could happen, but there’s no precedent for it.

Did you find this article helpful? Click on one of the following buttons
We're so happy you liked! Get more delivered to your inbox just like it.

We're sorry this article didn't help you today – we welcome feedback, so if there's any way you feel we could improve our content, please email us at contact@tech.co

Written by:
Adam is a writer at Tech.co and has worked as a tech writer, blogger and copy editor for more than a decade. He was a Forbes Contributor on the publishing industry, for which he was named a Digital Book World 2018 award finalist. His work has appeared in publications including Popular Mechanics and IDG Connect, and his art history book on 1970s sci-fi, 'Worlds Beyond Time,' is out from Abrams Books in July 2023. In the meantime, he's hunting down the latest news on VPNs, POS systems, and the future of tech.
Explore More See all news
Back to top
close Building a Website? We've tested and rated Wix as the best website builder you can choose – try it yourself for free Try Wix today