Performance of GPT-4 and GPT-3.5 in generating accurate and comprehensive diagnoses across medical subspecialties.

Dik Wai Anderson Luk Whitney Chin Tung Ip Yat-Fung Shea

J Chin Med Assoc

Department of Medicine, Queen Mary Hospital, University of Hong Kong, Hong Kong, China.

Published: March 2024

Artificial intelligence has demonstrated a promising potential for diagnosing complex medical cases, with Generative Pre-Trained Transformer 4 (GPT-4) being the most recent advancement in this field. This study evaluated the diagnostic performance of the GPT-4 in comparison with that of its predecessor, GPT-3.5, using 81 complex medical case records from the New England Journal of Medicine . The cases were categorized as cognitive impairment, infectious disease, rheumatology, or drug reactions. The GPT-4 achieved a primary diagnostic accuracy of 38.3%, which improved to 71.6% when differential diagnoses were included. In 84.0% of cases, primary diagnoses were made by conducting investigations suggested by GPT-4. GPT-4 outperformed GPT-3.5 in all subspecialties except for drug reactions. GPT-4 demonstrated the highest performance in infectious diseases and drug reactions, whereas it underperformed in cases of cognitive impairment. These findings indicate that GPT-4 can provide reasonably accurate diagnoses, comprehensive differential diagnoses, and appropriate investigations. However, its performance varies across subspecialties.

Download full-text PDF	Source
http://dx.doi.org/10.1097/JCMA.0000000000001064	DOI Listing

Publication Analysis

Top Keywords

drug reactions

performance gpt-4

complex medical

cognitive impairment

reactions gpt-4

differential diagnoses

gpt-4

diagnoses

performance

gpt-4 gpt-35

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!