AI Article Synopsis

  • - Recent advancements in large language models (LLMs), like GPT-3.5 and ChatGPT, have shown promise in performing well on tasks without needing extensive training, especially in medical evidence summarization across various clinical areas.
  • - This study evaluates these models through both automatic and human assessments and highlights that automatic metrics may not reliably reflect the actual quality of the summaries produced.
  • - Results indicate that LLMs can generate summaries that contain factual inaccuracies, make dubious or vague statements, and are particularly challenged when summarizing longer texts, raising concerns about the risk of misinformation in high-stakes medical settings.

Article Abstract

Recent advances in large language models (LLMs) have demonstrated remarkable successes in zero- and few-shot performance on various downstream tasks, paving the way for applications in high-stakes domains. In this study, we systematically examine the capabilities and limitations of LLMs, specifically GPT-3.5 and ChatGPT, in performing zero-shot medical evidence summarization across six clinical domains. We conduct both automatic and human evaluations, covering several dimensions of summary quality. Our study demonstrates that automatic metrics often do not strongly correlate with the quality of summaries. Furthermore, informed by our human evaluations, we define a terminology of error types for medical evidence summarization. Our findings reveal that LLMs could be susceptible to generating factually inconsistent summaries and making overly convincing or uncertain statements, leading to potential harm due to misinformation. Moreover, we find that models struggle to identify the salient information and are more error-prone when summarizing over longer textual contexts.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10449915PMC
http://dx.doi.org/10.1038/s41746-023-00896-7DOI Listing

Publication Analysis

Top Keywords

medical evidence
12
evidence summarization
12
large language
8
language models
8
human evaluations
8
evaluating large
4
models medical
4
summarization advances
4
advances large
4
models llms
4

Similar Publications

Digging deeper into necrotizing enterocolitis: bridging clinical, microbial, and molecular perspectives.

Gut Microbes

December 2025

Department of Pediatrics, Key Laboratory of Birth Defects and Related Diseases of Women and Children (Ministry of Education), West China Second University Hospital, Sichuan University, Chengdu, China.

Necrotizing Enterocolitis (NEC) is a severe, life-threatening inflammatory condition of the gastrointestinal tract, especially affecting preterm infants. This review consolidates evidence from various biomedical disciplines to elucidate the complex pathogenesis of NEC, integrating insights from clinical, microbial, and molecular perspectives. It emphasizes the modulation of NEC-associated inflammatory pathways by probiotics and novel biologics, highlighting their therapeutic potential.

View Article and Find Full Text PDF

Current and Emerging Therapies for Lysosomal Storage Disorders.

Drugs

January 2025

Lysosomal Storage Disorders Unit, Royal Free London NHS Foundation Trust, University College London, London, NW3 2QG, UK.

Lysosomal storage disorders (LSDs) are rare inherited metabolic disorders characterized by defects in the function of specific enzymes responsible for breaking down substrates within cellular organelles (lysosomes) essential for the processing of macromolecules. Undigested substrate accumulates within lysosomes, leading to cellular dysfunction, tissue damage, and clinical manifestations. Clinical features vary depending on the degree and type of enzyme deficiency, the type and extent of substrate accumulated, and the tissues affected.

View Article and Find Full Text PDF

Pharmacologic Management of Heart Failure with Preserved Ejection Fraction (HFpEF) in Older Adults.

Drugs Aging

January 2025

Program for the Care and Study of the Aging Heart, Department of Medicine, Weill Cornell Medicine, 420 East 70th St, New York, NY, LH-36510063, USA.

There are several pharmacologic agents that have been touted as guideline-directed medical therapy for heart failure with preserved ejection fraction (HFpEF). However, it is important to recognize that older adults with HFpEF also contend with an increased risk for adverse effects from medications due to age-related changes in pharmacokinetics and pharmacodynamics of medications, as well as the concurrence of geriatric conditions such as polypharmacy and frailty. With this review, we discuss the underlying evidence for the benefits of various treatments in HFpEF and incorporate key considerations for older adults, a subpopulation that may be at higher risk for adverse drug events.

View Article and Find Full Text PDF

Purpose: To review the current evidence on the association between salivary protein profile and dental caries in children during mixed dentition stage.

Methods: This systematic review followed the PRISMA 2020 guidelines. Searches were run in PubMed, Scopus and Embase along with gray literature.

View Article and Find Full Text PDF

Impact of hemoadsorption with CytoSorb® on meropenem and piperacillin exposure in critically ill patients in a post-CKRT setup: a single-center, retrospective data analysis.

Intensive Care Med Exp

January 2025

Freie Universität Berlin and Humboldt-Universität Zu Berlin, Department of Anesthesiology and Intensive Care Medicine, Charité-Universitätsmedizin Berlin, Campus Benjamin Franklin, Berlin, Germany.

Purpose: CytoSorb® (CS) adsorbent is a hemoadsorption filter for extracorporeal blood purification often integrated into continuous kidney replacement therapy (CKRT). It is primarily used in critically ill patients with sepsis and related conditions, including cytokine storms and systemic inflammatory responses. Up to now, there is no evidence nor recommendation for the use of CS filters in sepsis (22).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!