OpenAI’s Attempts to Stop Future AI Going Rogue Has Had Mixed Results

OpenAI is trying to find ways to develop Superintelligent AI safely, despite conflicting attitudes within the company.

OpenAI’s “Superalignment” team has announced a small breakthrough in how humans could be able to reign in AI once it supersedes our level of intelligence – a possible scenario known as AGI superintelligence.

Researching superintelligent machines has limitations though, as even once the technology becomes realized it may try and hide its true behavior from humans, according to Ilya Sutskever – OpenAI’s Co-founder and the driving force behind Sam Altman’s temporary ousting as CEO.

While superintelligence still remains hypothetical, OpenAI appears to be taking the concerns seriously, dedicating a 5th of its computing power to risk mitigation and investing $10 million into super alignment research.

OpenAI’s AGI Test is Promising, But Has Flaws

As OpenAI becomes increasingly concerned about the looming threat of AI superintelligence, its Superalignment team has just released its first research update – and its results are mixed.

The team, which was founded by Ilya Sutskever and OpenAI scientist Jan Leike, looked into how superintelligence systems could be supervised once they surpassed human capabilities, but since these machines don’t exist yet, OpenAI’s control test used GPT-2 and GPT-4 systems as stand-ins.

Surfshark logo🔎 Want to browse the web privately? 🌎 Or appear as if you're in another country?
Get a huge 86% off Surfshark with this special offer.See deal button

More specifically, researchers tested how the less sophisticated GPT-2 model would be able to supervise its most powerful model GPT-4 model. But was was their conclusion?

OpenAI graphic displaying their approach to super alignment

Well, after training GPT-2 to perform different tasks, including chess puzzles and sentiment analysis, and using these responses to train GPT-4, they found that OpenAI’s latest model performed 20-70% better than GPT-2, but still fell far short of its own potential.

GPT-4 also avoided many mistakes made by the inferior GPT-2 model. According to the researchers, this is evidence of a phenomenon called ‘weak-to-strong generalization’ that occurs when a model has implicit knowledge of how to perform tasks, and is able to execute them correctly despite receiving poor-quality instructions.

According to researchers, evidence of weak-to-stong generalization in the test suggests this phenomenon may also be present when humans supervise superintelligent AI. If this is true, when AGI models exist, and are instructed not to cause catastrophic harm, they may be able to tell even better than their human supervisors when their actions enter dangerous territory.

However, despite these positive findings, GPT-4’s performance was still weakened after being trained by GPT-2, and the experiment didn’t guarantee the stronger model would behave perfectly under supervision. These shortcomings reveal that more research has to be conducted on the matter before humans can be trusted as suitable supervisors.

OpenAI: Concerns Around Superintelligence Are “Obvious”

Despite being a frontrunner in AI development, OpenAI hasn’t been shy in voicing the potential dangers of superintelligent AI.

Co-founder Sutskever recently came out saying preventing AI from going rogue is an “obvious” concern, and the company even announced in a statement that technology could be very dangerous, and could lead to the disempowerment of humanity or even human extinction if left unchecked.

“We’re gonna see superhuman models, they’re gonna have vast capabilities and they could be very, very dangerous, and we don’t yet have the methods to control them.” – Leopold Aschenbrenner, OpenAI Researcher

But why are researchers so scared of these machines going rogue? Well, according to Sutskever, once superhuman AI models exceed human-level intelligence, they could become capable of masking their own behavior, opening us up to worrying and unpredictable and worrying circumstances.

OpenAI Claim to Be Taking AI Safety Seriously

While many experts believe these fears are overblown, OpenAI is already taking several steps to address safety concerns.

The company recently announced it would be investing $10 million into super alignment research, in the form of $2 million grants to university labs, and $150,000 grants to individual graduate students. Open AI also revealed it would be dedicating a fifth of its computing power to the Superalignment project, as it continues to preemptively research how to govern AGI.

Disputes around superintelligence and AI safety were even rumored to be the real reason behind Sam Altman’s shock, yet brief, deposition from the company last month. However, with  Altman currently back at the helm, and OpenAI continuing to prioritize commercial interests with its AGI project ‘Q*’, the company could still be doing a lot more to to mitigate risks to public safety.

Did you find this article helpful? Click on one of the following buttons
We're so happy you liked! Get more delivered to your inbox just like it.

We're sorry this article didn't help you today – we welcome feedback, so if there's any way you feel we could improve our content, please email us at

Written by:
Isobel O'Sullivan (BSc) is a senior writer at with over four years of experience covering business and technology news. Since studying Digital Anthropology at University College London (UCL), she’s been a regular contributor to Market Finance’s blog and has also worked as a freelance tech researcher. Isobel’s always up to date with the topics in employment and data security and has a specialist focus on POS and VoIP systems.
Explore More See all news
Back to top
close Building a Website? We've tested and rated Wix as the best website builder you can choose – try it yourself for free Try Wix today