Medical Misinformation in AI-Assisted Self-Diagnosis: Development of a Method (EvalPrompt) for Analyzing Large Language Models.

JMIR Form Res

Department of Management Sciences and Engineering, University of Waterloo, 200 University Avenue West, Waterloo, ON, N2L 3G1, Canada, 1 5198884567 ext 33279.

Published: March 2025

Background: Rapid integration of large language models (LLMs) in health care is sparking global discussion about their potential to revolutionize health care quality and accessibility. At a time when improving health care quality and access remains a critical concern for countries worldwide, the ability of these models to pass medical examinations is often cited as a reason to use them for medical training and diagnosis. However, the impact of their inevitable use as a self-diagnostic tool and their role in spreading health care misinformation has not been evaluated.

Objective: This study aims to assess the effectiveness of LLMs, particularly ChatGPT, from the perspective of an individual self-diagnosing to better understand the clarity, correctness, and robustness of the models.

Methods: We propose the comprehensive testing methodology evaluation of LLM prompts (EvalPrompt). This evaluation methodology uses multiple-choice medical licensing examination questions to evaluate LLM responses. Experiment 1 prompts ChatGPT with open-ended questions to mimic real-world self-diagnosis use cases, and experiment 2 performs sentence dropout on the correct responses from experiment 1 to mimic self-diagnosis with missing information. Humans then assess the responses returned by ChatGPT for both experiments to evaluate the clarity, correctness, and robustness of ChatGPT.

Results: In experiment 1, we found that ChatGPT-4.0 was deemed correct for 31% (29/94) of the questions by both nonexperts and experts, with only 34% (32/94) agreement between the 2 groups. Similarly, in experiment 2, which assessed robustness, 61% (92/152) of the responses continued to be categorized as correct by all assessors. As a result, in comparison to a passing threshold of 60%, ChatGPT-4.0 is considered incorrect and unclear, though robust. This indicates that sole reliance on ChatGPT-4.0 for self-diagnosis could increase the risk of individuals being misinformed.

Conclusions: The results highlight the modest capabilities of LLMs, as their responses are often unclear and inaccurate. Any medical advice provided by LLMs should be cautiously approached due to the significant risk of misinformation. However, evidence suggests that LLMs are steadily improving and could potentially play a role in health care systems in the future. To address the issue of medical misinformation, there is a pressing need for the development of a comprehensive self-diagnosis dataset. This dataset could enhance the reliability of LLMs in medical applications by featuring more realistic prompt styles with minimal information across a broader range of medical fields.

Download full-text PDF

Source
http://dx.doi.org/10.2196/66207DOI Listing

Publication Analysis

Top Keywords

health care
20
medical
8
medical misinformation
8
large language
8
language models
8
care quality
8
clarity correctness
8
correctness robustness
8
responses experiment
8
llms
6

Similar Publications

Background: Flexible optical intubation (FOI) is the preferred technique for managing anticipated difficult airways, particularly in awake patients when anatomical factors complicate conventional laryngoscopy. Mastering the procedure requires skills, but a comprehensive overview of the evidence on training and assessment of FOI skills is lacking. There is no evidence-based consensus on educational strategies and recommendations for skill acquisition and retention, thus highlighting a significant gap in airway management training.

View Article and Find Full Text PDF

Objectives: The objective of this review is to identify, appraise, and synthesize available evidence on the experiences of informal caregivers providing HIV and/or AIDS care and the experiences of care received by people living with HIV and/or AIDS (PLHIV) in sub-Saharan Africa.

Introduction: PLHIV share the burden of the disease with their informal caregivers throughout their lives. Experiences of HIV- and/or AIDS-related caregiving and care receiving have a significant impact on the treatment and physiological health outcomes of both care receivers and caregivers.

View Article and Find Full Text PDF

Effectiveness of prehospital chest decompression in resolving clinical signs of tension pneumothorax.

Transfusion

March 2025

Israel Defense Forces Medical Corps, Surgeon General's Headquarters, Israel Defense Forces, Ramat Gan, Israel.

Background: Thoracic injuries are a leading cause of morbidity and mortality in military trauma. Tension pneumothorax (TPX) is a critical diagnosis that can lead to rapid hemodynamic and respiratory collapse if untreated. While timely intervention is essential, prehospital TPX diagnosis is challenging and may lead to unnecessary interventions.

View Article and Find Full Text PDF

Objectives: to analyse the prevalence and characteristics of the hikikomori phenomenon in Italy within a representative sample of students aged 15 to 19 years, assessing the factors associated with this behaviour to guide preventive interventions.

Design: cross-sectional study based on anonymous data collected through the ESPAD®Italia (European School Survey Project on Alcohol and other Drugs) survey using a self-administered questionnaire.

Setting And Participants: a representative sample of Italian high-school students is selected annually to ensure the comparability of ESPAD®Italia estimates.

View Article and Find Full Text PDF

Aim: Rehospitalization of patients with heart failure (HF) incurs high health care costs and increased mortality. Infection-related rehospitalizations in patients with HF occur frequently, and the risk increases with age. This study aimed to identify the factors associated with infection-related rehospitalizations in older patients with HF.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!