: This study investigates the potential of artificial intelligence (AI), specifically large language models (LLMs) like ChatGPT, to enhance decision support in diagnosing epilepsy. AI tools can improve diagnostic accuracy, efficiency, and decision-making speed. The aim of this study was to compare the level of agreement in epilepsy diagnosis between human experts (epileptologists) and AI (ChatGPT), using the 2014 International League Against Epilepsy (ILAE) criteria, and to identify potential predictors of diagnostic errors made by ChatGPT. : A retrospective analysis was conducted on data from 597 patients who visited the emergency department for either a first epileptic seizure or a recurrence. Diagnoses made by experienced epileptologists were compared with those made by ChatGPT 4.0, which was trained on the 2014 ILAE epilepsy definition. The agreement between human and AI diagnoses was assessed using Cohen's kappa statistic. Sensitivity and specificity were compared using 2 × 2 contingency tables, and multivariate analyses were performed to identify variables associated with diagnostic errors. : Neurologists diagnosed epilepsy in 216 patients (36.2%), while ChatGPT diagnosed it in 109 patients (18.2%). The agreement between neurologists and ChatGPT was very low, with a Cohen's kappa value of -0.01 (95% confidence intervals, CI: -0.08 to 0.06). ChatGPT's sensitivity was 17.6% (95% CI: 14.5-20.6), specificity was 81.4% (95% CI: 78.2-84.5), positive predictive value was 34.8% (95% CI: 31.0-38.6), and negative predictive value was 63.5% (95% CI: 59.6-67.4). ChatGPT made diagnostic errors in 41.7% of the cases, with errors more frequent in older patients and those with specific medical conditions. The correct classification was associated with acute symptomatic seizures of unknown etiology. : ChatGPT 4.0 does not reach human clinicians' performance in diagnosing epilepsy, showing poor performance in identifying epilepsy but better at recognizing non-epileptic cases. The overall concordance between human clinicians and AI is extremely low. Further research is needed to improve the diagnostic accuracy of ChatGPT and other LLMs.

Download full-text PDF

Source
http://dx.doi.org/10.3390/jcm14020322DOI Listing

Publication Analysis

Top Keywords

diagnostic errors
12
chatgpt
10
diagnosing epilepsy
8
improve diagnostic
8
diagnostic accuracy
8
cohen's kappa
8
epilepsy
7
diagnostic
6
95%
5
chatgpt diagnose
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!