Accurate and erroneous suggestions were blended within about 30% of the chatbot’s answers, complicating the identification of mistakes.
The internet has transformed the way people gather information, and for those seeking medical knowledge, this has proven to be a game-changer. But with the rise of advanced technologies like ChatGPT, the question arises: can artificial intelligence provide accurate medical recommendations? Researchers from Brigham and Women’s Hospital delved into this very question, revealing intriguing findings in a recent study published in JAMA Oncology.
ChatGPT, a sophisticated AI chatbot, has been making its mark as a source of medical advice. However, the study discovered that in around one-third of cases, ChatGPT provided recommendations for cancer treatment that didn’t align with the respected National Comprehensive Cancer Network (NCCN) guidelines. This underscores a crucial lesson: while technology is a valuable resource, it has limitations that must be recognized.
Dr. Danielle Bitterman, an expert in Radiation Oncology and part of the Artificial Intelligence in Medicine (AIM) Program at Mass General Brigham, emphasized, “ChatGPT responses can sound a lot like a human and can be quite convincing. But, when it comes to clinical decision-making, there are so many subtleties for every patient’s unique situation. A right answer can be very nuanced, and not necessarily something ChatGPT or another large language model can provide.”
The entry of artificial intelligence into the medical realm has been revolutionary, with potential to reshape patient care. Mass General Brigham, a leading academic health system, is at the forefront of exploring AI’s responsible integration into healthcare delivery, support for medical professionals, and administrative processes.
To assess ChatGPT’s accuracy, Bitterman and her colleagues focused on its alignment with NCCN guidelines for three common cancers: breast, prostate, and lung. In their investigation, ChatGPT was asked to provide treatment approaches based on disease severity, yielding a total of 104 prompts.
Shockingly, although 98 percent of responses partially aligned with NCCN guidelines, 34 percent contained suggestions that diverged from these guidelines. These…