Background: This study evaluated the feasibility of ChatGPT and Gemini, two off-the-shelf large language models (LLMs), to automate causality assessments, focusing on Adverse Events Following Immunizations (AEFIs) of myocarditis and pericarditis related to COVID-19 vaccines.

Methods: We assessed 150 COVID-19-related cases of myocarditis and pericarditis reported to the Vaccine Adverse Event Reporting System (VAERS) in the United States of America (USA). Both LLMs and human experts conducted the World Health Organization (WHO) algorithm for vaccine causality assessments, and inter-rater agreement was measured using percentage agreement. Adherence to the WHO algorithm was evaluated by comparing LLM responses to the expected sequence of the algorithm. Statistical analyses, including descriptive statistics and Random Forest modeling, explored case complexity (e.g., string length measurements) and factors affecting LLM performance and adherence.

Results: ChatGPT showed higher adherence to the WHO algorithm (34%) compared to Gemini (7%) and had moderate agreement (71%) with human experts, whereas Gemini had fair agreement (53%). Both LLMs often failed to recognize listed AEFIs, with ChatGPT and Gemini incorrectly identifying 6.7% and 13.3% of AEFIs, respectively. ChatGPT showed inconsistencies in 8.0% of cases and Gemini in 46.7%. For ChatGPT, adherence to the algorithm was associated with lower string complexity in prompt sections. The random forest analysis achieved an accuracy of 55% (95% confidence interval: 35.7-73.5) for predicting adherence to the WHO algorithm for ChatGPT.

Conclusion: Notable limitations of ChatGPT and Gemini have been identified in their use for aiding causality assessments in vaccine safety. ChatGPT performed better, with higher adherence and agreement with human experts. In the investigated scenario, both models are better suited as complementary tools to human expertise.

Download full-text PDF

Source
http://dx.doi.org/10.1007/s40264-025-01531-yDOI Listing

Publication Analysis

Top Keywords

adherence algorithm
16
chatgpt gemini
12
causality assessments
12
human experts
12
off-the-shelf large
8
large language
8
language models
8
myocarditis pericarditis
8
random forest
8
higher adherence
8

Similar Publications

Artificial intelligence integration in surgery through hand and instrument tracking: a systematic literature review.

Front Surg

February 2025

The Loyal and Edith Davis Neurosurgical Research Laboratory, Department of Neurosurgery, Barrow Neurological Institute, St. Joseph's Hospital and Medical Center, Phoenix, AZ, United States.

Objective: This systematic literature review of the integration of artificial intelligence (AI) applications in surgical practice through hand and instrument tracking provides an overview of recent advancements and analyzes current literature on the intersection of surgery with AI. Distinct AI algorithms and specific applications in surgical practice are also examined.

Methods: An advanced search using medical subject heading terms was conducted in Medline (via PubMed), SCOPUS, and Embase databases for articles published in English.

View Article and Find Full Text PDF

AI-Powered Analysis of Weight Loss Reports from Reddit: Unlocking Social Media's Potential in Dietary Assessment.

Nutrients

February 2025

Computer Simulation, Genomics and Data Analysis Laboratory, Department of Food Science and Nutrition, School of the Environment, University of the Aegean, 81400 Myrina, Greece.

: The increasing use of social media for sharing health and diet experiences presents new opportunities for nutritional research and dietary assessment. Large language models (LLMs) and artificial intelligence (AI) offer innovative approaches to analyzing self-reported data from online communities. This study explores weight loss experiences associated with the ketogenic diet (KD) using user-generated content from Reddit, aiming to identify trends and potential biases in self-reported outcomes.

View Article and Find Full Text PDF

Patients with psoriasis often experience psychiatric comorbidities, such as depression and anxiety. These comorbidities can lead to poorer adherence to treatment regimens, reduced effectiveness of therapies, and a heightened disease burden. This study aims to explore the scientific output related to psoriasis, depression, and anxiety using a comprehensive analysis combining bibliometric statistical methods.

View Article and Find Full Text PDF

Background: With the rapid digitalization of healthcare and an aging population, understanding the factors influencing older adults' sustained adoption of Internet medical services is critical. However, existing research often oversimplifies these factors by relying on linear models. This study integrates Partial Least Squares Structural Equation Modeling (PLS-SEM) and fuzzy-set Qualitative Comparative Analysis (fsQCA) to explore the complex pathways driving continued use.

View Article and Find Full Text PDF

Background: This study evaluated the feasibility of ChatGPT and Gemini, two off-the-shelf large language models (LLMs), to automate causality assessments, focusing on Adverse Events Following Immunizations (AEFIs) of myocarditis and pericarditis related to COVID-19 vaccines.

Methods: We assessed 150 COVID-19-related cases of myocarditis and pericarditis reported to the Vaccine Adverse Event Reporting System (VAERS) in the United States of America (USA). Both LLMs and human experts conducted the World Health Organization (WHO) algorithm for vaccine causality assessments, and inter-rater agreement was measured using percentage agreement.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!