Background: Systematic reviews (SRs) are hindered by the initial rigorous article screen, which delays access to reliable information synthesis.
Objective: To develop generic prompt templates for large language model (LLM)-driven abstract and full-text screening that can be adapted to different reviews.
Design: Diagnostic test accuracy.
Setting: 48 425 citations were tested for abstract screening across 10 SRs. Full-text screening evaluated all 12 690 freely available articles from the original search. Prompt development used the GPT4-0125-preview model (OpenAI).
Participants: None.
Measurements: Large language models were prompted to include or exclude articles based on SR eligibility criteria. Model outputs were compared with original SR author decisions after full-text screening to evaluate performance (accuracy, sensitivity, and specificity).
Results: Optimized prompts using GPT4-0125-preview achieved a weighted sensitivity of 97.7% (range, 86.7% to 100%) and specificity of 85.2% (range, 68.3% to 95.9%) in abstract screening and weighted sensitivity of 96.5% (range, 89.7% to 100.0%) and specificity of 91.2% (range, 80.7% to 100%) in full-text screening across 10 SRs. In contrast, zero-shot prompts had poor sensitivity (49.0% abstract, 49.1% full-text). Across LLMs, Claude-3.5 (Anthropic) and GPT4 variants had similar performance, whereas Gemini Pro (Google) and GPT3.5 (OpenAI) models underperformed. Direct screening costs for 10 000 citations differed substantially: Where single human abstract screening was estimated to require more than 83 hours and $1666.67 USD, our LLM-based approach completed screening in under 1 day for $157.02 USD.
Limitations: Further prompt optimizations may exist. Retrospective study. Convenience sample of SRs. Full-text screening evaluations were limited to free PubMed Central full-text articles.
Conclusion: A generic prompt for abstract and full-text screening achieving high sensitivity and specificity that can be adapted to other SRs and LLMs was developed. Our prompting innovations may have value to SR investigators and researchers conducting similar criteria-based tasks across the medical sciences.
Primary Funding Source: None.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.7326/ANNALS-24-02189 | DOI Listing |
Purpose: The proximal femur is a frequent site of cancer dissemination in the extremities. Patients treated surgically for skeletal metastases have poorer overall health compared to other orthopedic patients, with only one-third expected to survive two years post-surgery. Choosing a treatment that minimizes revision risk and ensures the implant outlives the patient is therefore crucial.
View Article and Find Full Text PDFClin Spine Surg
March 2025
Department of Orthopedics, Beth Israel Deaconess Medical Center, Harvard Medical School.
Study Design: Systematic review and meta-analysis.
Objective: To determine whether venous thromboembolism (VTE) prophylaxis is necessary after spine trauma and to assess the efficacy and safety profiles of anticoagulation agents.
Summary Of Background Data: Venous stasis, endothelial disruption, hypercoagulability, and orthopedic injury in spine trauma predispose 12%-64% of patients to deep vein thrombosis (DVT).
Nutr Rev
March 2025
Department of Epidemiology of the School of Public Health in Austin, The University of Texas Health Science Center at Houston (UTHealth Houston), Austin, TX 78701, United States.
Context: Given the diverse aspects of the family food environment, it is essential to clarify the availability of tools, the assessed dimensions, and the extent to which they offer a comprehensive and valid evaluation of the domestic food setting.
Objective: This systematic review aims to assess the validity and reliability of instruments gauging the food environment within the pediatric population.
Data Sources: A systematic literature search was conducted in the EMBASE, Medline (PubMed), SCOPUS, Web of Science, and PsychINFO databases until December 2023, resulting in the identification of 2850 potentially eligible articles.
Infez Med
March 2025
Masters' Program of Clinical Epidemiology and Biostatistics, Faculty of Health Sciences, Universidad Cientifica del Sur, Lima, 15067, Peru.
Introduction: The incidence of dengue and its complications increases globally, mainly in areas where it is endemic; however, little literature evaluates outcomes in kidney transplant recipients (KTR). The present analysis aimed to determine the incidence, signs and symptoms, and allograft dysfunction in dengue-infected KTR.
Methods: Systematic review of the literature following PRISMA 2020 indications with studies included until November 24, 2023.
Introduction: Dengue is a mosquito-borne viral disease. It has been associated with high maternal and foetal morbidity and mortality. Therefore, this study aimed to describe the outcomes of Dengue infection in pregnant women in terms of maternal bleeding, miscarriage, preterm delivery, severe Dengue, Dengue shock and maternal mortality, as well as foetal outcomes in terms of foetal distress, low birth weight and neonatal mortality.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!