Background: Diagnosis is a core component of effective health care, but misdiagnosis is common and can put patients at risk. Diagnostic decision support systems can play a role in improving diagnosis by physicians and other health care workers. Symptom checkers (SCs) have been designed to improve diagnosis and triage (ie, which level of care to seek) by patients.
Objective: The aim of this study was to evaluate the performance of the new large language model ChatGPT (versions 3.5 and 4.0), the widely used WebMD SC, and an SC developed by Ada Health in the diagnosis and triage of patients with urgent or emergent clinical problems compared with the final emergency department (ED) diagnoses and physician reviews.
Methods: We used previously collected, deidentified, self-report data from 40 patients presenting to an ED for care who used the Ada SC to record their symptoms prior to seeing the ED physician. Deidentified data were entered into ChatGPT versions 3.5 and 4.0 and WebMD by a research assistant blinded to diagnoses and triage. Diagnoses from all 4 systems were compared with the previously abstracted final diagnoses in the ED as well as with diagnoses and triage recommendations from three independent board-certified ED physicians who had blindly reviewed the self-report clinical data from Ada. Diagnostic accuracy was calculated as the proportion of the diagnoses from ChatGPT, Ada SC, WebMD SC, and the independent physicians that matched at least one ED diagnosis (stratified as top 1 or top 3). Triage accuracy was calculated as the number of recommendations from ChatGPT, WebMD, or Ada that agreed with at least 2 of the independent physicians or were rated "unsafe" or "too cautious."
Results: Overall, 30 and 37 cases had sufficient data for diagnostic and triage analysis, respectively. The rate of top-1 diagnosis matches for Ada, ChatGPT 3.5, ChatGPT 4.0, and WebMD was 9 (30%), 12 (40%), 10 (33%), and 12 (40%), respectively, with a mean rate of 47% for the physicians. The rate of top-3 diagnostic matches for Ada, ChatGPT 3.5, ChatGPT 4.0, and WebMD was 19 (63%), 19 (63%), 15 (50%), and 17 (57%), respectively, with a mean rate of 69% for physicians. The distribution of triage results for Ada was 62% (n=23) agree, 14% unsafe (n=5), and 24% (n=9) too cautious; that for ChatGPT 3.5 was 59% (n=22) agree, 41% (n=15) unsafe, and 0% (n=0) too cautious; that for ChatGPT 4.0 was 76% (n=28) agree, 22% (n=8) unsafe, and 3% (n=1) too cautious; and that for WebMD was 70% (n=26) agree, 19% (n=7) unsafe, and 11% (n=4) too cautious. The unsafe triage rate for ChatGPT 3.5 (41%) was significantly higher (P=.009) than that of Ada (14%).
Conclusions: ChatGPT 3.5 had high diagnostic accuracy but a high unsafe triage rate. ChatGPT 4.0 had the poorest diagnostic accuracy, but a lower unsafe triage rate and the highest triage agreement with the physicians. The Ada and WebMD SCs performed better overall than ChatGPT. Unsupervised patient use of ChatGPT for diagnosis and triage is not recommended without improvements to triage accuracy and extensive clinical evaluation.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10582809 | PMC |
http://dx.doi.org/10.2196/49995 | DOI Listing |
Backgrounds: Biomedical research requires sophisticated understanding and reasoning across multiple specializations. While large language models (LLMs) show promise in scientific applications, their capability to safely and accurately support complex biomedical research remains uncertain.
Methods: We present , a novel question-and-answer benchmark for evaluating LLMs in biomedical research.
Front Med (Lausanne)
January 2025
Clinical Informatics Fellowship Program, Baylor Scott & White Health, Round Rock, TX, United States.
Generative artificial intelligence (GenAI) is rapidly transforming various sectors, including healthcare and education. This paper explores the potential opportunities and risks of GenAI in graduate medical education (GME). We review the existing literature and provide commentary on how GenAI could impact GME, including five key areas of opportunity: electronic health record (EHR) workload reduction, clinical simulation, individualized education, research and analytics support, and clinical decision support.
View Article and Find Full Text PDFJ Allergy Clin Immunol Glob
February 2025
University Centre for Research and Development Department of Pharmaceutical Sciences, Chandigarh University Gharuan, Mohali, Punjab, India.
Glob Epidemiol
June 2025
Business Analytics (BANA) Program, Business School, University of Colorado, 1475 Lawrence St. Denver, CO 80217-3364, USA.
AI-assisted data analysis can help risk analysts better understand exposure-response relationships by making it relatively easy to apply advanced statistical and machine learning methods, check their assumptions, and interpret their results. This paper demonstrates the potential of large language models (LLMs), such as ChatGPT, to facilitate statistical analyses, including survival data analyses, for health risk assessments. Through AI-guided analyses using relatively recent and advanced methods such as Individual Conditional Expectation (ICE) plots using Random Survival Forests and Heterogeneous Treatment Effects (HTEs) estimated using Causal Survival Forests, population-level exposure-response functions can be disaggregated into individual-level exposure-response functions.
View Article and Find Full Text PDFJ Community Hosp Intern Med Perspect
January 2025
Department of Medicine, West Virginia University, House Staff 4Floor HSC-N Morgantown, PO Box 9168, Morgantown, WV, USA.
Implantable cardiac devices, including cardiac pacemakers, are not without risk for infection, carrying a mortality and morbidity of around 5-15%. Gram positive organisms are most common in 91% of cases, whereas gram negative organisms are less common, found in 2% of cases secondary to gram negative organisms other than . Here, we present a rare case of the gram-negative organism leading to a pacemaker site infection.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!