Siamese KG-LSTM: A deep learning model for enriching UMLS Metathesaurus synonymy.

Tien T T Tran Sy V Nghiem Van T Le Tho T Quan Vinh Nguyen Hong Yung Yip Olivier Bodenreider

Int Conf Knowl Syst Eng

National Library of Medicine, National Institute of Health, Bethesda, MD, USA.

Published: November 2020

The Unified Medical Language System, or UMLS, is a repository of medical terminology developed by the U.S. National Library of Medicine for improving the computer system's ability of understanding the biomedical and health languages. The UMLS Metathesaurus is one of the three UMLS knowledge sources, containing medical terms and their relationships. Due to the rapid increase in the number of medical terms recently, the current construction of UMLS Metathesaurus, which heavily depends on lexical tools and human editors, is error-prone and time-consuming. This paper takes advantages of the emerging deep learning models for learning to predict the synonyms and non-synonyms between the pairs of biomedical terms in the Metathesaurus. Our learning approach focuses a subset of specific terms instead of the whole Metathesaurus corpus. Particularly, we train the models with biomedical terms from the Disorders semantic group. To strengthen the models, we enrich the inputs with different strategies, including synonyms and hierarchical relationships from source vocabularies. Our deep learning model adopts the Siamese KG-LSTM (Siamese Knowledge Graph - Long Short-Term Memory) in the architecture. The experimental results show that this approach yields excellent performance when handling the task of synonym detection for Disorders semantic group in the Metathesaurus. This shows the potential of applying machine learning techniques in the UMLS Metathesaurus construction process. Although the work in this paper focuses only on specific semantic group of Disorders, we believe that the proposed method can be applied to other semantic groups in the UMLS Metathesaurus.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9584311	PMC
http://dx.doi.org/10.1109/kse50997.2020.9287797	DOI Listing

Publication Analysis

Top Keywords

umls metathesaurus

deep learning

semantic group

siamese kg-lstm

learning model

metathesaurus

medical terms

biomedical terms

terms metathesaurus

disorders semantic

Similar Publications

Bridging language barriers in healthcare: a patient-centric mobile app for multilingual health record access and sharing.

Front Digit Health

February 2025

Department of Computer Science and Biomedical Engineering Research Centre, University of Cyprus, Nicosia, Cyprus.

Theodoros Solomou Stelios Mappouras Efthyvoulos Kyriacou Ioannis Constantinou Zinonas Antoniou

Introduction: Access to health data for patients is hindered by a fragmented healthcare system and the absence of unified, patient-centric solutions. Additionally, there are no mechanics for easy sharing of medical records with healthcare providers, risking incomplete diagnoses. To further intensify the problem, when patients seek care abroad, language barriers may prevent foreign doctors from understanding their health data, further complicating treatment.

View Article and Find Full Text PDF

Similar Publications

Leveraging Medical Knowledge Graphs Into Large Language Models for Diagnosis Prediction: Design and Application Study.

JMIR AI

February 2025

Department of Medicine, University of Wisconsin-Madison, Madison, WI, United States.

Yanjun Gao Ruizhe Li Emma Croxford John Caskey Brian W Patterson

Background: Electronic health records (EHRs) and routine documentation practices play a vital role in patients' daily care, providing a holistic record of health, diagnoses, and treatment. However, complex and verbose EHR narratives can overwhelm health care providers, increasing the risk of diagnostic inaccuracies. While large language models (LLMs) have showcased their potential in diverse language tasks, their application in health care must prioritize the minimization of diagnostic errors and the prevention of patient harm.

View Article and Find Full Text PDF

Similar Publications

Analysis of longitudinal social media for monitoring symptoms during a pandemic.

J Biomed Inform

February 2025

School of Public Health, Zhejiang University School of Medicine, Hangzhou 310058 China; Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA. Electronic address:

Shixu Lin Lucas Garay Yining Hua Zhijiang Guo Wanxin Li

Objective: Current studies leveraging social media data for disease monitoring face challenges like noisy colloquial language and insufficient tracking of user disease progression in longitudinal data settings. This study aims to develop a pipeline for collecting, cleaning, and analyzing large-scale longitudinal social media data for disease monitoring, with a focus on COVID-19 pandemic.

Materials And Methods: This pipeline initiates by screening COVID-19 cases from tweets spanning February 1, 2020, to April 30, 2022.

View Article and Find Full Text PDF

Similar Publications

Improving large language model applications in biomedicine with retrieval-augmented generation: a systematic review, meta-analysis, and clinical development guidelines.

J Am Med Inform Assoc

January 2025

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37212, United States.

Siru Liu Allison B McCoy Adam Wright

Objective: The objectives of this study are to synthesize findings from recent research of retrieval-augmented generation (RAG) and large language models (LLMs) in biomedicine and provide clinical development guidelines to improve effectiveness.

Materials And Methods: We conducted a systematic literature review and a meta-analysis. The report was created in adherence to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses 2020 analysis.

View Article and Find Full Text PDF

Similar Publications

Leveraging Transformers-based models and linked data for deep phenotyping in radiology.

Comput Methods Programs Biomed

March 2025

Laberit, Avda. de Catalunya, 9, València, 46020, Spain.

Lluís-F Hurtado Luis Marco-Ruiz Encarna Segarra Maria Jose Castro-Bleda Aurelia Bustos-Moreno

Background And Objective: Despite significant investments in the normalization and the standardization of Electronic Health Records (EHRs), free text is still the rule rather than the exception in clinical notes. The use of free text has implications in data reuse methods used for supporting clinical research since the query mechanisms used in cohort definition and patient matching are mainly based on structured data and clinical terminologies. This study aims to develop a method for the secondary use of clinical text by: (a) using Natural Language Processing (NLP) for tagging clinical notes with biomedical terminology; and (b) designing an ontology that maps and classifies all the identified tags to various terminologies and allows for running phenotyping queries.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!