Leveraging large language models for knowledge-free weak supervision in clinical natural language processing.

Sci Rep

McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.

Published: March 2025

The performance of deep learning-based natural language processing systems is based on large amounts of labeled training data which, in the clinical domain, are not easily available or affordable. Weak supervision and in-context learning offer partial solutions to this issue, particularly using large language models (LLMs), but their performance still trails traditional supervised methods with moderate amounts of gold-standard data. In particular, inferencing with LLMs is computationally heavy. We propose an approach leveraging fine-tuning LLMs and weak supervision with virtually no domain knowledge that still achieves consistently dominant performance. Using a prompt-based approach, the LLM is used to generate weakly-labeled data for training a downstream BERT model. The weakly supervised model is then further fine-tuned on small amounts of gold standard data. We evaluate this approach using Llama2 on three different i2b2/ n2c2 datasets for clinical named entity recognition. With no more than 10 gold standard notes, our final BERT models weakly supervised by fine-tuned Llama2-13B consistently outperformed out-of-the-box PubMedBERT by 4.7-47.9% in F1 scores. With only 50 gold standard notes, our models achieved close performance to fully fine-tuned systems.

Download full-text PDF	Source
http://dx.doi.org/10.1038/s41598-024-68168-2	DOI Listing

Publication Analysis

Top Keywords

weak supervision

gold standard

large language

language models

natural language

language processing

weakly supervised

standard notes

leveraging large

language

Similar Publications

Frailty Trajectories Following Adjuvant Chemotherapy and Mortality in Older Women With Breast Cancer.

JAMA Netw Open

March 2025

Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill.

Emilie D Duchesneau Dae Hyun Kim Til Stürmer Qoua Her Zhang Zhang

Importance: Frailty assessed at a single time point is associated with mortality in older women with breast cancer. Little is known about how changes in frailty following cancer treatment initiation affect mortality.

Objective: To evaluate the association between claims-based frailty trajectories following adjuvant chemotherapy initiation and 5-year mortality in older women with stage I to III breast cancer.

View Article and Find Full Text PDF

Similar Publications

Multidependency Graph Convolutional Networks and Contrastive Learning for Drug Repositioning.

J Chem Inf Model

March 2025

School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China.

Yanglan Gan Shengnan Li Guangwei Xu Cairong Yan Guobing Zou

The goal of drug repositioning is to expedite the drug development process by finding novel therapeutic applications for approved drugs. Using multifeature learning, different computational drug repositioning techniques have recently been introduced to predict possible drug-disease relationships. Nevertheless, current graph-based methods tend to model drug-disease interaction relationships without considering the semantic influence of node-specific side information on graphs.

View Article and Find Full Text PDF

Similar Publications

Profile of Chief Medical Officers and performance of health zones in crisis contexts: a cross-sectional study in three provinces of the Eastern Democratic Republic of Congo.

Hum Resour Health

March 2025

Ecole Régionale de Santé Publique, Université Catholique de Bukavu, Avenue Michombero N° 02, Bukavu, Democratic Republic of Congo.

Rosine Bigirinama Jean-Corneille Lembebu Christian Chiribagula Pacifique Mwene-Batu Denis Porignon

Context: In crisis-affected health systems, the performance of health zones (also known as health districts) is challenged by recurrent armed conflicts and state fragility. The profiles of health zone managers and contextual factors can significantly influence the zones' ability to effectively respond to population health needs. This study explores these interactions to identify key factors associated with health zones performances in three provinces of Eastern Democratic Republic of Congo (DRC), a region that has endured over three decades of conflict.

View Article and Find Full Text PDF

Similar Publications

Unsupervised machine learning identifies biomarkers of disease progression in post-kala-azar dermal leishmaniasis in Sudan.

PLoS Negl Trop Dis

March 2025

WHO Collaborating Centre for Leishmaniasis. Spanish National Centre for Microbiology, Instituto de Salud Carlos III, Majadahonda (Madrid), Spain.

Ana Torres Brima Musa Younis Samuel Tesema Jose Carlos Solana Javier Moreno

Background: Post-kala-azar dermal leishmaniasis (PKDL) appears as a rash in some individuals who have recovered from visceral leishmaniasis caused by Leishmania donovani. Today, basic knowledge of this neglected disease and how to predict its progression remain largely unknown.

Methods And Findings: This study addresses the use of several biochemical, haematological and immunological variables, independently or through unsupervised machine learning (ML), to predict PKDL progression risk.

View Article and Find Full Text PDF

Similar Publications

Chronic lymphocytic leukemia (CLL) screening and abnormality detection based on multi-layer fluorescence imaging signal enhancement and compensation.

J Cancer Res Clin Oncol

March 2025

School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, 130032, China.

Lemin Shi Ping Gong Mingye Li Dianxin Song Hao Zhang

Purpose: Fluorescence in situ hybridization (FISH) plays a critical role in cancer screening but faces challenges in signal clarity and manual intervention. This study aims to enhance FISH signal clarity, improve screening efficiency, and reduce false negatives through an automated image acquisition and signal enhancement framework.

Methods: An automated workflow was developed, integrating a dynamic signal enhancement method that optimizes global and local features.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!