The advent of patient access to complex medical information online has highlighted the need for simplification of biomedical text to improve patient understanding and engagement in taking ownership of their health. However, comprehension of biomedical text remains a difficult task due to the need for domain-specific expertise. We aimed to study the simplification of biomedical text via large language models (LLMs) commonly used for general natural language processing tasks involve text comprehension, summarization, generation, and prediction of new text from prompts. Specifically, we finetuned three variants of large language models to perform substitutions of complex words and word phrases in biomedical text with a related hypernym. The output of the text substitution process using LLMs was evaluated by comparing the pre- and post-substitution texts using four readability metrics and two measures of sentence complexity. A sample of 1,000 biomedical definitions in the National Library of Medicine's Unified Medical Language System (UMLS) was processed with three LLM approaches, and each showed an improvement in readability and sentence complexity after hypernym substitution. Readability scores were translated from a pre-processed collegiate reading level to a post-processed US high-school level. Comparison between the three LLMs showed that the GPT-J-6b approach had the best improvement in measures of sentence complexity. This study demonstrates the merit of hypernym substitution to improve readability of complex biomedical text for the public and highlights the use case for fine-tuning open-access large language models for biomedical natural language processing.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11020904PMC
http://dx.doi.org/10.1371/journal.pdig.0000489DOI Listing

Publication Analysis

Top Keywords

biomedical text
24
large language
16
language models
16
hypernym substitution
12
sentence complexity
12
biomedical
8
simplification biomedical
8
text
8
natural language
8
language processing
8

Similar Publications

Technology and Dementia Preconference.

Alzheimers Dement

December 2024

UT Health San Antonio, San Antonio, TX, USA.

Background: Primary progressive aphasia (PPA) is a language-led dementia associated with underlying Alzheimer's disease (AD) or frontotemporal lobar degeneration pathology. As part of the Alzheimer's spectrum, logopenic (lv) PPA may be particularly difficult to distinguish from amnestic AD, due to overlapping clinical features. Analysis of linguistic and acoustic variables derived from connected speech has shown promise as a diagnostic tool for differentiating dementia subtypes.

View Article and Find Full Text PDF

Effects and mechanisms of computerized cognitive training in Huntington's disease: protocol for a pilot study.

Neurodegener Dis Manag

January 2025

Turner Institute for Brain & Mental Health, School of Psychological Sciences, Faculty of Medicine, Nursing & Health Sciences, 18 Innovation Walk, Monash University, Clayton VIC 3800, Australia.

Huntington's disease (HD) causes progressive cognitive decline, with no available treatments. Computerized cognitive training (CCT) has shown efficacy in other populations, but its effects in HD are largely unknown. This pilot study will explore the effects and neural mechanisms of CCT in HD.

View Article and Find Full Text PDF

Background: The inadequate inclusion of sex and gender in medical research has resulted in biased clinical guidance and disparities in knowledge and patient outcomes. Despite efforts by regulatory and funding agencies, opportunities to generate sex-specific knowledge are frequently overlooked. While certain disciplines in cardiovascular medicine have made notable progress, these advances have yet to permeate the literature on perioperative cardiovascular complications in non-cardiac surgery.

View Article and Find Full Text PDF

Biomedical datasets are the mainstays of computational biology and health informatics projects, and can be found on multiple data platforms online or obtained from wet-lab biologists and physicians. The quality and the trustworthiness of these datasets, however, can sometimes be poor, producing bad results in turn, which can harm patients and data subjects. To address this problem, policy-makers, researchers, and consortia have proposed diverse regulations, guidelines, and scores to assess the quality and increase the reliability of datasets.

View Article and Find Full Text PDF

Clinical decision-making is driven by multimodal data, including clinical notes and pathological characteristics. Artificial intelligence approaches that can effectively integrate multimodal data hold significant promise in advancing clinical care. However, the scarcity of well-annotated multimodal datasets in clinical settings has hindered the development of useful models.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!