Off-the-Shelf Large Language Models for Causality Assessment of Individual Case Safety Reports: A Proof-of-Concept with COVID-19 Vaccines.

Andrea Abate Elisa Poncato Maria Antonietta Barbieri Greg Powell Andrea Rossi Simay Peker Anders Hviid Andrew Bate Maurizio Sessa

Drug Saf

Department of Drug Design and Pharmacology, University of Copenhagen, Jagtvej 160, 2100, Copenhagen, Denmark.

Published: March 2025

Background: This study evaluated the feasibility of ChatGPT and Gemini, two off-the-shelf large language models (LLMs), to automate causality assessments, focusing on Adverse Events Following Immunizations (AEFIs) of myocarditis and pericarditis related to COVID-19 vaccines.

Methods: We assessed 150 COVID-19-related cases of myocarditis and pericarditis reported to the Vaccine Adverse Event Reporting System (VAERS) in the United States of America (USA). Both LLMs and human experts conducted the World Health Organization (WHO) algorithm for vaccine causality assessments, and inter-rater agreement was measured using percentage agreement. Adherence to the WHO algorithm was evaluated by comparing LLM responses to the expected sequence of the algorithm. Statistical analyses, including descriptive statistics and Random Forest modeling, explored case complexity (e.g., string length measurements) and factors affecting LLM performance and adherence.

Results: ChatGPT showed higher adherence to the WHO algorithm (34%) compared to Gemini (7%) and had moderate agreement (71%) with human experts, whereas Gemini had fair agreement (53%). Both LLMs often failed to recognize listed AEFIs, with ChatGPT and Gemini incorrectly identifying 6.7% and 13.3% of AEFIs, respectively. ChatGPT showed inconsistencies in 8.0% of cases and Gemini in 46.7%. For ChatGPT, adherence to the algorithm was associated with lower string complexity in prompt sections. The random forest analysis achieved an accuracy of 55% (95% confidence interval: 35.7-73.5) for predicting adherence to the WHO algorithm for ChatGPT.

Conclusion: Notable limitations of ChatGPT and Gemini have been identified in their use for aiding causality assessments in vaccine safety. ChatGPT performed better, with higher adherence and agreement with human experts. In the investigated scenario, both models are better suited as complementary tools to human expertise.

Download full-text PDF	Source
http://dx.doi.org/10.1007/s40264-025-01531-y	DOI Listing

Publication Analysis

Top Keywords

adherence algorithm

chatgpt gemini

causality assessments

human experts

off-the-shelf large

large language

language models

myocarditis pericarditis

random forest

higher adherence

Similar Publications

Artificial intelligence integration in surgery through hand and instrument tracking: a systematic literature review.

Front Surg

February 2025

The Loyal and Edith Davis Neurosurgical Research Laboratory, Department of Neurosurgery, Barrow Neurological Institute, St. Joseph's Hospital and Medical Center, Phoenix, AZ, United States.

Kivanc Yangi Thomas J On Yuan Xu Arianna S Gholami Jinpyo Hong

Objective: This systematic literature review of the integration of artificial intelligence (AI) applications in surgical practice through hand and instrument tracking provides an overview of recent advancements and analyzes current literature on the intersection of surgery with AI. Distinct AI algorithms and specific applications in surgical practice are also examined.

Methods: An advanced search using medical subject heading terms was conducted in Medline (via PubMed), SCOPUS, and Embase databases for articles published in English.

View Article and Find Full Text PDF

Similar Publications

AI-Powered Analysis of Weight Loss Reports from Reddit: Unlocking Social Media's Potential in Dietary Assessment.

Nutrients

February 2025

Computer Simulation, Genomics and Data Analysis Laboratory, Department of Food Science and Nutrition, School of the Environment, University of the Aegean, 81400 Myrina, Greece.

Efstathios Kaloudis Victoria Kouti Foteini-Maria Triantafillou Patroklos Ventouris Rafail Pavlidis

: The increasing use of social media for sharing health and diet experiences presents new opportunities for nutritional research and dietary assessment. Large language models (LLMs) and artificial intelligence (AI) offer innovative approaches to analyzing self-reported data from online communities. This study explores weight loss experiences associated with the ketogenic diet (KD) using user-generated content from Reddit, aiming to identify trends and potential biases in self-reported outcomes.

View Article and Find Full Text PDF

Similar Publications

Depression and Anxiety in Patients with Psoriasis: A Comprehensive Analysis Combining Bibliometrics, Latent Dirichlet Allocation, and HJ-Biplot.

Healthcare (Basel)

February 2025

Escuela de Medicina, Colegio de Ciencias de la salud, Universidad San Francisco de Quito, Quito 170901, Ecuador.

Aline Siteneski Karime Montes-Escobar Javier de la Hoz-M German Josuet Lapo-Talledo Geovanna Gutiérrez Moreno

Patients with psoriasis often experience psychiatric comorbidities, such as depression and anxiety. These comorbidities can lead to poorer adherence to treatment regimens, reduced effectiveness of therapies, and a heightened disease burden. This study aims to explore the scientific output related to psoriasis, depression, and anxiety using a comprehensive analysis combining bibliometric statistical methods.

View Article and Find Full Text PDF

Similar Publications

Study on the "digital divide" in the continuous utilization of Internet medical services for older adults: Combination with PLS-SEM and fsQCA analysis approach.

Int J Equity Health

March 2025

School of Public Health, Peking University, Beijing, China.

Wang Yu Ying Ji Zhijing Li Kun Wang Xue Jiang

Background: With the rapid digitalization of healthcare and an aging population, understanding the factors influencing older adults' sustained adoption of Internet medical services is critical. However, existing research often oversimplifies these factors by relying on linear models. This study integrates Partial Least Squares Structural Equation Modeling (PLS-SEM) and fuzzy-set Qualitative Comparative Analysis (fsQCA) to explore the complex pathways driving continued use.

View Article and Find Full Text PDF

Similar Publications

Off-the-Shelf Large Language Models for Causality Assessment of Individual Case Safety Reports: A Proof-of-Concept with COVID-19 Vaccines.

Drug Saf

March 2025

Department of Drug Design and Pharmacology, University of Copenhagen, Jagtvej 160, 2100, Copenhagen, Denmark.

Andrea Abate Elisa Poncato Maria Antonietta Barbieri Greg Powell Andrea Rossi

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!