Study: ChatGPT Gets Medical Diagnosis Wrong 50% of the Time

AI should not be used for medical diagnoses, according to new research published in the PLOS One journal. The study, conducted by the University of West London, found that, when faced with a series of medical questions, ChatGPT gave a correct diagnosis less than half of the time.

The researchers asked it to choose the correct diagnosis from a variety of options, as well as provide an explanation for its choice. It was found to be correct just 49% of the time – while proving capable at simplifying complex medical terminology.

Amid a flurry of activity in the healthcare space, with researchers keen to explore potential use cases for the fast-growing technology, these findings suggest that robots will not be prowling the hospital wards anytime soon.

AI Not Ready for Healthcare

Researchers presented ChatGPT with 150 complex medical cases. The platform was asked to provide the correct diagnosis from a multiple-choice format, along with its rationale. The team observed that it was only right 49% of the time – although it gave competent, simplified answers that sounded convincing.

Published in July, the study set out to evaluate the “diagnostic accuracy and utility of ChatGPT in medical education,” according to CBC. Said lead researcher Dr. Amrit Kirpalani: “We wanted to know, how would it deal with…those complicated cases that we see in medicine?”

This just in! View
the top business tech deals for 2025 👨‍💻

While the accuracy rate will do nothing to pour cold water on the swirling misinformation debate, researchers were encouraged by the platform’s capacity to simplify complex medical terminology. Kirpalani continued: “I think we can harness this for education.”

Researchers Exploring Potential Use Cases

These findings are another twist in what is turning out to be a long-running saga – with researchers determined to find uses cases for AI within the healthcare industry. A Stanford University study recently set out to evaluate whether or not LLMs could be used to diagnose OCD, a notoriously difficult condition to identify.

Remarkably, AI was found to outperform healthcare professionals in several instances, with ChatGPT-4 correctly identifying OCD in every patient it was presented with. By contrast, psychology doctoral trainees were only able to diagnose OCD 81.5% of the time, with primary care physicians coming in at 49.5%.

The University of West London study was originally conducted in 2023 with ChatGPT and the ChatGPT-3.5 LLM. In light of these Stanford findings, the scientists can only speculate as to how an updated model would perform when faced with the same diagnostic challenges.

Jury Still Out on AI

Even as the technology accelerates at a dizzying pace, AI continues to divide opinion among the general population. Its biggest cheerleaders – tech icons like Elon Musk and Mark Zuckerberg – believe that we’re on the cusp of a global revolution.

According to Pew Research Center, however, over half (52%) of US citizens are “more concerned than excited” about the potential of AI. A further 60% expressed discomfort at the idea of their healthcare practitioner relying on the technology.

In recent months, concern over the spread of misinformation has grown, with AI at the center of a number of high-profile gaffes. Earlier this year, for instance, Google’s Gemini drew the ire of Musk, who branded the platform “racist” and “anti-civilizational.”

While these findings hint at a promising future for AI in medicine, they also provide a cautionary tale – the industry, and wider public, should continue to practice a healthy skepticism where AI is concerned.

Study: ChatGPT Gets Medical Diagnosis Wrong Half of the Time

AI Not Ready for Healthcare

Researchers Exploring Potential Use Cases

Jury Still Out on AI

Written by:

AI Glossary: Understanding Artificial Intelligence Terms 2025

What Is Sora 2 and is the AI Video Tool Free?

Study: Nearly a Third of Workers Have Skipped Meetings With AI

Chegg Fires Nearly Half of Workforce Due to ‘New Realities’ of AI