Ascle-A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study.

Rui Yang Qingcheng Zeng Keen You Yujie Qiao Lucas Huang Chia-Chun Hsieh Benjamin Rosand Jeremy Goldwasser Amisha Dave Tiarnan Keenan Yuhe Ke Chuan Hong Nan Liu Emily Chew Dragomir Radev Zhiyong Lu Hua Xu Qingyu Chen Irene Li

J Med Internet Res

Information Technology Center, University of Tokyo, Kashiwa, Japan.

Published: October 2024

Medical texts are difficult to manage and time-consuming to curate manually, prompting the development of NLP algorithms to automate this process for improved efficiency in the biomedical field.
The study introduces Ascle, a user-friendly tool designed for biomedical researchers that offers generative functions like question-answering and text summarization, along with 12 essential NLP functions and search capabilities.
After fine-tuning 32 language models and validating through physician assessments, results showed significant improvements in text generation tasks, with notable increases in machine translation and question-answering accuracy.

Background: Medical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits for text processing exist, which have greatly improved the efficiency of handling unstructured text. However, these existing toolkits tend to emphasize different perspectives, and none of them offer generation capabilities, leaving a significant gap in the current offerings.

Objective: This study aims to describe the development and preliminary evaluation of Ascle. Ascle is tailored for biomedical researchers and clinical staff with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle provides 4 advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases.

Methods: We fine-tuned 32 domain-specific language models and evaluated them thoroughly on 27 established benchmarks. In addition, for the question-answering task, we developed a retrieval-augmented generation (RAG) framework for large language models that incorporated a medical knowledge graph with ranking techniques to enhance the reliability of generated answers. Additionally, we conducted a physician validation to assess the quality of generated content beyond automated metrics.

Results: The fine-tuned models and RAG framework consistently enhanced text generation tasks. For example, the fine-tuned models improved the machine translation task by 20.27 in terms of BLEU score. In the question-answering task, the RAG framework raised the ROUGE-L score by 18% over the vanilla models. Physician validation of generated answers showed high scores for readability (4.95/5) and relevancy (4.43/5), with a lower score for accuracy (3.90/5) and completeness (3.31/5).

Conclusions: This study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. All code is publicly available through the Ascle GitHub repository. All fine-tuned language models can be accessed through Hugging Face.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11487205	PMC
http://dx.doi.org/10.2196/60601	DOI Listing

Publication Analysis

Top Keywords

text generation

language models

rag framework

natural language

language processing

text

medical text

development evaluation

text processing

evaluation ascle

Similar Publications

Natural language processing to evaluate texting conversations between patients and healthcare providers during COVID-19 Home-Based Care in Rwanda at scale.

PLOS Digit Health

January 2025

Rwanda Ministry of Health, Kigali, Rwanda.

Richard T Lester Matthew Manson Muhammed Semakula Hyeju Jang Hassan Mugabo

Community isolation of patients with communicable infectious diseases limits spread of pathogens but our understanding of isolated patients' needs and challenges is incomplete. Rwanda deployed a digital health service nationally to assist public health clinicians to remotely monitor and support SARS-CoV-2 cases via their mobile phones using daily interactive short message service (SMS) check-ins. We aimed to assess the texting patterns and communicated topics to better understand patient experiences.

View Article and Find Full Text PDF

Similar Publications

Low-Intensity Online Intervention for Mental Distress Among Help-Seeking Young People in Hong Kong: A Randomized Clinical Trial.

JAMA Netw Open

January 2025

Department of Psychiatry, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong.

Yi Nam Suen Christy Lai Ming Hui Lauren Ka Shun Lei Chung Ming Leung Stephanie Ming Yin Wong

Importance: Mental health issues among young people are increasingly concerning. Conventional psychological interventions face challenges, including limited staffing, time commitment, and low completion rates.

Objective: To evaluate the effect of a low-intensity online intervention on young people in Hong Kong experiencing moderate or greater mental distress.

View Article and Find Full Text PDF

Similar Publications

Screening for Structural Heart Defects: A Single-Center Retrospective Cost Analysis for Fetal Echocardiography in Adults with Congenital Heart Disease.

Pediatr Cardiol

January 2025

Department of Pediatric Cardiology, Seattle Children's Hospital, Seattle, WA, USA.

David M Leone Benjamin Ittleman Kathryn Virk Catherine Albright Bhawna Arya

Fetal echocardiography (FE) is recommended for parents with congenital heart disease (pCHD) due to a 3-6% recurrence risk of congenital heart disease (CHD). This study aimed to evaluate the cost of FE for detecting neonatal CHD in pCHD. FE data were collected between 12/2015 and 12/2022.

View Article and Find Full Text PDF

Similar Publications

Development and evaluation of a 4M taxonomy from nursing home staff text messages using a fine-tuned generative language model.

J Am Med Inform Assoc

January 2025

Sinclair School of Nursing, University of Missouri, Columbia, MO 65211, United States.

Matthew Steven Farmer Mihail Popescu Kimberly Powell

Objective: This study aimed to explore the utilization of a fine-tuned language model to extract expressions related to the Age-Friendly Health Systems 4M Framework (What Matters, Medication, Mentation, and Mobility) from nursing home worker text messages, deploy automated mapping of these expressions to a taxonomy, and explore the created expressions and relationships.

Materials And Methods: The dataset included 21 357 text messages from healthcare workers in 12 Missouri nursing homes. A sample of 860 messages was annotated by clinical experts to form a "Gold Standard" dataset.

View Article and Find Full Text PDF

Similar Publications

CXR-LLaVA: a multimodal large language model for interpreting chest X-ray images.

Eur Radiol

January 2025

Department of Radiology, Seoul National University College of Medicine, Seoul National University Hospital, Seoul, Republic of Korea.

Seowoo Lee Jiwon Youn Hyungjin Kim Mansu Kim Soon Ho Yoon

Objective: This study aimed to develop an open-source multimodal large language model (CXR-LLaVA) for interpreting chest X-ray images (CXRs), leveraging recent advances in large language models (LLMs) to potentially replicate the image interpretation skills of human radiologists.

Materials And Methods: For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain radiographic abnormalities (Dataset 1) and 217,699 provided free-text radiology reports (Dataset 2). After pre-training a vision transformer with Dataset 1, we integrated it with an LLM influenced by the LLaVA network.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!