Large language models outperform traditional natural language processing methods in extracting patient-reported outcomes in IBD.

Perseus V Patel Conner Davis Amariel Ralbovsky Daniel Tinoco Christopher Y K Williams Shadera Slatter Behzad Naderalvojoud Michael J Rosen Tina Hernandez-Boussard Vivek Rudrapatna

medRxiv

Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA.

Published: September 2024

Background And Aims: Patient-reported outcomes (PROs) are vital in assessing disease activity and treatment outcomes in inflammatory bowel disease (IBD). However, manual extraction of these PROs from the free-text of clinical notes is burdensome. We aimed to improve data curation from free-text information in the electronic health record, making it more available for research and quality improvement. This study aimed to compare traditional natural language processing (tNLP) and large language models (LLMs) in extracting three IBD PROs (abdominal pain, diarrhea, fecal blood) from clinical notes across two institutions.

Methods: Clinic notes were annotated for each PRO using preset protocols. Models were developed and internally tested at the University of California San Francisco (UCSF), and then externally validated at Stanford University. We compared tNLP and LLM-based models on accuracy, sensitivity, specificity, positive and negative predictive value. Additionally, we conducted fairness and error assessments.

Results: Inter-rater reliability between annotators was >90%. On the UCSF test set (n=50), the top-performing tNLP models showcased accuracies of 92% (abdominal pain), 82% (diarrhea) and 80% (fecal blood), comparable to GPT-4, which was 96%, 88%, and 90% accurate, respectively. On external validation at Stanford (n=250), tNLP models failed to generalize (61-62% accuracy) while GPT-4 maintained accuracies >90%. PaLM-2 and GPT-4 showed similar performance. No biases were detected based on demographics or diagnosis.

Conclusions: LLMs are accurate and generalizable methods for extracting PROs. They maintain excellent accuracy across institutions, despite heterogeneity in note templates and authors. Widespread adoption of such tools has the potential to enhance IBD research and patient care.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11398594	PMC
http://dx.doi.org/10.1101/2024.09.05.24313139	DOI Listing

Publication Analysis

Top Keywords

large language

language models

traditional natural

natural language

language processing

methods extracting

patient-reported outcomes

clinical notes

abdominal pain

fecal blood

Similar Publications

Clinical concept annotation with contextual word embedding in active transfer learning environment.

Digit Health

December 2024

School of Computer Science, University of Birmingham, Birmingham, UK.

Asim Abbas Mark Lee Niloofer Shanavas Venelin Kovatchev

Objective: The study aims to present an active learning approach that automatically extracts clinical concepts from unstructured data and classifies them into explicit categories such as Problem, Treatment, and Test while preserving high precision and recall and demonstrating the approach through experiments using i2b2 public datasets.

Methods: Initially labeled data are acquired from a lexical-based approach in sufficient amounts to perform an active learning process. A contextual word embedding similarity approach is adopted using BERT base variant models such as ClinicalBERT, DistilBERT, and SCIBERT to automatically classify the unlabeled clinical concept into explicit categories.

View Article and Find Full Text PDF

Similar Publications

Simulate Scientific Reasoning with Multiple Large Language Models: An Application to Alzheimer's Disease Combinatorial Therapy.

medRxiv

December 2024

Qidi Xu Xiaozhong Liu Xiaoqian Jiang Yejin Kim

Motivation: This study aims to develop an AI-driven framework that leverages large language models (LLMs) to simulate scientific reasoning and peer review to predict efficacious combinatorial therapy when data-driven prediction is infeasible.

Results: Our proposed framework achieved a significantly higher accuracy (0.74) than traditional knowledge-based prediction (0.

View Article and Find Full Text PDF

Similar Publications

Recent updates of interferon-derived myxovirus resistance protein A as a biomarker for acute viral infection.

Eur J Med Res

December 2024

Department of Medical Laboratory Science, College of Medicine and Health Sciences, Debre Markos University, 269, Debre Markos, Ethiopia.

Desalegn Abebaw Yibeltal Akelew Adane Adugna Zigale Hibstu Teffera Habtamu Belew

Background: Antibiotic resistance (AMR) remains a global public health threat with a high burden in sub-Saharan countries. The overuse of antimicrobials in the clinical setting is the main factor for the spread of antibiotic resistance. Diagnostic uncertainty in differentiating between bacterial and viral infections is the major contributor to antimicrobial overuse.

View Article and Find Full Text PDF

Similar Publications

Using a naive Bayesian approach to identify academic risk based on multiple sources: A conceptual replication.

J Sch Psychol

February 2025

Department of Educational Psychology, University of Wisconsin-Madison, United States.

Carly Oddleifson Stephen Kilgus David A Klingbeil Alexander D Latham Jessica S Kim

The purpose of this study was to conduct a conceptual replication of Pendergast et al.'s (2018) study that examined the diagnostic accuracy of a nomogram procedure, also known as a naive Bayesian approach. The specific naive Bayesian approach combined academic and social-emotional and behavioral (SEB) screening data to predict student performance on a state end-of-year achievement test.

View Article and Find Full Text PDF

Similar Publications

Annotating publicly-available samples and studies using interpretable modeling of unstructured metadata.

Brief Bioinform

November 2024

Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States.

Hao Yuan Parker Hicks Mansooreh Ahmadian Kayla A Johnson Lydia Valtadoros

Reusing massive collections of publicly available biomedical data can significantly impact knowledge discovery. However, these public samples and studies are typically described using unstructured plain text, hindering the findability and further reuse of the data. To combat this problem, we propose txt2onto 2.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!