Background: Artificial intelligence (AI) chatbots have demonstrated proficiency in structured knowledge assessments; however, there is limited research on their performance in scenarios involving diagnostic uncertainty, which requires careful interpretation and complex decision-making. This study aims to evaluate the efficacy of AI chatbots, GPT-4o and Claude-3, in addressing medical scenarios characterized by diagnostic uncertainty relative to Family Medicine residents.
Methods: Questions with diagnostic uncertainty were extracted from the Progress Tests administered by the Department of Family and Community Medicine at the University of Toronto between 2022 and 2023. Diagnostic uncertainty questions were defined as those presenting clinical scenarios where symptoms, clinical findings, and patient histories do not converge on a definitive diagnosis, necessitating nuanced diagnostic reasoning and differential diagnosis. These questions were administered to a cohort of 320 Family Medicine residents in their first (PGY-1) and second (PGY-2) postgraduate years and inputted into GPT-4o and Claude-3. Errors were categorized into statistical, information, and logical errors. Statistical analyses were conducted using a binomial generalized estimating equation model, paired t-tests, and chi-squared tests.
Results: Compared to the residents, both chatbots scored lower on diagnostic uncertainty questions (p < 0.01). PGY-1 residents achieved a correctness rate of 61.1% (95% CI: 58.4-63.7), and PGY-2 residents achieved 63.3% (95% CI: 60.7-66.1). In contrast, Claude-3 correctly answered 57.7% (n = 52/90) of questions, and GPT-4o correctly answered 53.3% (n = 48/90). Claude-3 had a longer mean response time (24.0 s, 95% CI: 21.0-32.5 vs. 12.4 s, 95% CI: 9.3-15.3; p < 0.01) and produced longer answers (2001 characters, 95% CI: 1845-2212 vs. 1596 characters, 95% CI: 1395-1705; p < 0.01) compared to GPT-4o. Most errors by GPT-4o were logical errors (62.5%).
Conclusions: While AI chatbots like GPT-4o and Claude-3 demonstrate potential in handling structured medical knowledge, their performance in scenarios involving diagnostic uncertainty remains suboptimal compared to human residents.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11470580 | PMC |
http://dx.doi.org/10.1186/s12909-024-06115-5 | DOI Listing |
BMJ Case Rep
January 2025
General Surgery, Whipps Cross University Hospital NHS Trust, London, UK.
Intra-abdominal lymphangioma, a rare benign lymphatic malformation resulting from an obstruction to lymphatic channels, often has non-specific clinical manifestations. Low incidence rates of this condition, paired with its unusual presentation and ambiguous radiological appearance, commonly lead to diagnostic uncertainty. This pathology can result in significant morbidity and mortality, emphasising the need to achieve early diagnosis and management despite these challenges.
View Article and Find Full Text PDFPLoS One
January 2025
Tranzo, Scientific Center for Care and Wellbeing, Tilburg School of Social and Behavioral Sciences, Tilburg University, Tilburg, The Netherlands.
Objective: An increasing number of people resumes life after cancer treatment. Although the (long-term) side-effects of cancer and its treatment can be significant, less is known about the impact on cancer survivors' participation in daily life. The aim of this study was to explore the common experiences of cancer survivors in resuming life after treatment.
View Article and Find Full Text PDFJ Vis
January 2025
Laboratoire des Systèmes Perceptifs, Département d'études cognitives, École normale supérieure, PSL University, France.
Visual perception has been described as a dynamic process where incoming visual information is combined with what has been seen before to form the current percept. Such a process can result in multiple visual aftereffects that can be attractive toward or repulsive away from past visual stimulation. A lot of research has been conducted on what functional role the mechanisms that produce these aftereffects may play.
View Article and Find Full Text PDFAlzheimers Dement
December 2024
Neurology Department Infanta Leonor Hospital, Madrid, Spain.
Background: biomarkers are essential in order to make a diagnosis with a high level of accuracy in patients with cognitive and behavior complaints. However, molecular imaging biomarkers not always provide an answer in daily clinical practice.
Methods: retrospective and descriptive study in patients with Amyloid PET (APscans) implemented according to rational use of this technic, between January 2019-November 2023 in Neurology Department, Infanta Leonor Hospital, Madrid, Spain.
Background: The Centiloid method (CL) was introduced as a tracer-independent measure for cortical amyloid load and is now commonly used in Alzheimer's disease (AD) clinical trials. To facilitate its implementation into clinical settings, the AMYPAD consortium set out to integrate existing literature and recent work from the consortium to provide clinical context-of-use recommendations of the Centiloid scale, which has been submitted to the European Medicine Agency for endorsement as a Biomarker Qualification Opinion.
Method: Screening of the literature was performed on the 7/11/23 on PubMed to identify articles mentioning "Centiloid".
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!