Study: ChatGPT Gets Medical Diagnosis Wrong Half of the Time

AI is not yet advanced enough to diagnose complex medical problems, new study finds.

AI should not be used for medical diagnoses, according to new research published in the PLOS One journal. The study, conducted by the University of West London, found that, when faced with a series of medical questions, ChatGPT gave a correct diagnosis less than half of the time.

The researchers asked it to choose the correct diagnosis from a variety of options, as well as provide an explanation for its choice. It was found to be correct just 49% of the time – while proving capable at simplifying complex medical terminology.

Amid a flurry of activity in the healthcare space, with researchers keen to explore potential use cases for the fast-growing technology, these findings suggest that robots will not be prowling the hospital wards anytime soon.

AI Not Ready for Healthcare

Researchers presented ChatGPT with 150 complex medical cases. The platform was asked to provide the correct diagnosis from a multiple-choice format, along with its rationale. The team observed that it was only right 49% of the time – although it gave competent, simplified answers that sounded convincing.

Published in July, the study set out to evaluate the “diagnostic accuracy and utility of ChatGPT in medical education,” according to CBC. Said lead researcher Dr. Amrit Kirpalani: “We wanted to know, how would it deal with…those complicated cases that we see in medicine?”

About Tech.co Video Thumbnail Showing Lead Writer Conor Cawley Smiling Next to Tech.co LogoThis just in! View
the top business tech deals for 2024 👨‍💻
See the list button

While the accuracy rate will do nothing to pour cold water on the swirling misinformation debate, researchers were encouraged by the platform’s capacity to simplify complex medical terminology. Kirpalani continued: “I think we can harness this for education.”

Researchers Exploring Potential Use Cases

These findings are another twist in what is turning out to be a long-running saga – with researchers determined to find uses cases for AI within the healthcare industry. A Stanford University study recently set out to evaluate whether or not LLMs could be used to diagnose OCD, a notoriously difficult condition to identify.

Remarkably, AI was found to outperform healthcare professionals in several instances, with ChatGPT-4 correctly identifying OCD in every patient it was presented with. By contrast, psychology doctoral trainees were only able to diagnose OCD 81.5% of the time, with primary care physicians coming in at 49.5%.

The University of West London study was originally conducted in 2023 with ChatGPT and the ChatGPT-3.5 LLM. In light of these Stanford findings, the scientists can only speculate as to how an updated model would perform when faced with the same diagnostic challenges.

Jury Still Out on AI

Even as the technology accelerates at a dizzying pace, AI continues to divide opinion among the general population. Its biggest cheerleaders – tech icons like Elon Musk and Mark Zuckerberg – believe that we’re on the cusp of a global revolution.

According to Pew Research Center, however, over half (52%) of US citizens are “more concerned than excited” about the potential of AI. A further 60% expressed discomfort at the idea of their healthcare practitioner relying on the technology.

In recent months, concern over the spread of misinformation has grown, with AI at the center of a number of high-profile gaffes. Earlier this year, for instance, Google’s Gemini drew the ire of Musk, who branded the platform “racist” and “anti-civilizational.”

While these findings hint at a promising future for AI in medicine, they also provide a cautionary tale – the industry, and wider public, should continue to practice a healthy skepticism where AI is concerned.

Did you find this article helpful? Click on one of the following buttons
We're so happy you liked! Get more delivered to your inbox just like it.

We're sorry this article didn't help you today – we welcome feedback, so if there's any way you feel we could improve our content, please email us at contact@tech.co

Written by:
Gus is a Senior Writer at Tech.co. Since completing his studies, he has pursued a career in fintech and technology writing which has involved writing reports on subjects including web3 and inclusive design. His work has featured extensively on 11:FS, The Fold Creative, and Morocco Bound Review. Outside of Tech.co, he has an avid interest in US politics and culture.
Explore More See all news
Back to top
close Building a Website? We've tested and rated Wix as the best website builder you can choose – try it yourself for free Try Wix today