In a recent study, a generative artificial intelligence (AI) program called GPT-4 (Generative Pre-trained Transformer 4) developed by OpenAI, demonstrated higher accuracy in diagnosing elderly patients with extensive medical histories and long hospital stays compared to clinicians. The study suggests that AI technology has the potential to identify missed diagnoses and improve patient outcomes. However, while the results are promising, there are certain limitations and challenges associated with the use of AI in clinical diagnosis.
The study involved six patients aged 65 or older with delayed diagnoses. GPT-4 accurately diagnosed four out of six patients, whereas clinicians only accurately diagnosed two patients. When differential diagnoses were considered, AI’s accuracy improved to five out of six patient diagnoses, compared to three out of six correct patient diagnoses made by clinicians. Additionally, a medical diagnostic decision tool called Isabel DDx Companion accurately diagnosed none of the patients initially, and only two out of six patients when provided with differential diagnosis information.
According to the lead author, GPT-4 has the potential to identify potential diagnoses that may have been overlooked by clinicians. It can assist in analyzing clinical situations with diagnostic difficulties and alert clinicians to possible underlying malignancies or drug side effects. This tool could be particularly valuable in lower-income countries with limited access to specialists, as it can provide suggestions similar to those of a specialist.
The success of GPT-4 in diagnosing patients can be attributed to the availability of extensive medical histories including radiological and pharmacological information. The study focused on elderly patients who often have multiple comorbidities, making it challenging to achieve timely and accurate diagnoses. GPT-4 can help clinicians identify diagnoses they may have missed otherwise, thus reducing the time to initial diagnosis in this patient population.
Despite its impressive accuracy, GPT-4 has certain limitations and challenges. The AI program struggled with diagnosing certain conditions, particularly multifocal infections. It failed to pinpoint the source of a recurrent infection in one patient and did not suggest relevant testing for infections in most patients. This highlights the importance of human expertise and the limitations of AI in complex diagnostic scenarios.
The study’s small sample size is another limitation. It analyzed the medical histories of only six patients from a single hospital unit specializing in geriatrics. While the results are promising, further research involving larger, more diverse patient populations is necessary to validate the findings and determine the generalizability of GPT-4’s diagnostic capabilities.
The authors emphasize that GPT-4 should be viewed as a tool to enhance a clinician’s confidence in their diagnosis or provide suggestions akin to those of a specialist. The AI program can help identify potential side effects of drugs or abnormal findings on imaging, particularly in settings where immediate consultation with subspecialties is not available. However, caution must be exercised, as GPT-4 can regurgitate inaccurate information based on incorrect medical histories.
Generative AI programs like GPT-4 have shown promise in improving clinician responses and potentially reducing missed diagnoses. The study highlights the potential benefits of using AI technology, particularly when extensive medical histories are available. However, there are limitations and challenges that need to be addressed. Further research, larger sample sizes, and rigorous validation are necessary before widespread implementation of AI in clinical practice. Nonetheless, AI has the potential to supplement clinician expertise and improve patient outcomes in certain diagnostic scenarios. It is important to strike a balance between leveraging the capabilities of AI and recognizing the inherent limitations in its current state of development.