The Unified Medical Language System, or UMLS, is a repository of medical terminology developed by the U.S. National Library of Medicine for improving the computer system's ability of understanding the biomedical and health languages. The UMLS Metathesaurus is one of the three UMLS knowledge sources, containing medical terms and their relationships. Due to the rapid increase in the number of medical terms recently, the current construction of UMLS Metathesaurus, which heavily depends on lexical tools and human editors, is error-prone and time-consuming. This paper takes advantages of the emerging deep learning models for learning to predict the synonyms and non-synonyms between the pairs of biomedical terms in the Metathesaurus. Our learning approach focuses a subset of specific terms instead of the whole Metathesaurus corpus. Particularly, we train the models with biomedical terms from the Disorders semantic group. To strengthen the models, we enrich the inputs with different strategies, including synonyms and hierarchical relationships from source vocabularies. Our deep learning model adopts the Siamese KG-LSTM (Siamese Knowledge Graph - Long Short-Term Memory) in the architecture. The experimental results show that this approach yields excellent performance when handling the task of synonym detection for Disorders semantic group in the Metathesaurus. This shows the potential of applying machine learning techniques in the UMLS Metathesaurus construction process. Although the work in this paper focuses only on specific semantic group of Disorders, we believe that the proposed method can be applied to other semantic groups in the UMLS Metathesaurus.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9584311PMC
http://dx.doi.org/10.1109/kse50997.2020.9287797DOI Listing

Publication Analysis

Top Keywords

umls metathesaurus
20
deep learning
12
semantic group
12
siamese kg-lstm
8
learning model
8
metathesaurus
8
medical terms
8
biomedical terms
8
terms metathesaurus
8
disorders semantic
8

Similar Publications

Introduction: Access to health data for patients is hindered by a fragmented healthcare system and the absence of unified, patient-centric solutions. Additionally, there are no mechanics for easy sharing of medical records with healthcare providers, risking incomplete diagnoses. To further intensify the problem, when patients seek care abroad, language barriers may prevent foreign doctors from understanding their health data, further complicating treatment.

View Article and Find Full Text PDF

Background: Electronic health records (EHRs) and routine documentation practices play a vital role in patients' daily care, providing a holistic record of health, diagnoses, and treatment. However, complex and verbose EHR narratives can overwhelm health care providers, increasing the risk of diagnostic inaccuracies. While large language models (LLMs) have showcased their potential in diverse language tasks, their application in health care must prioritize the minimization of diagnostic errors and the prevention of patient harm.

View Article and Find Full Text PDF

Analysis of longitudinal social media for monitoring symptoms during a pandemic.

J Biomed Inform

February 2025

School of Public Health, Zhejiang University School of Medicine, Hangzhou 310058 China; Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA. Electronic address:

Objective: Current studies leveraging social media data for disease monitoring face challenges like noisy colloquial language and insufficient tracking of user disease progression in longitudinal data settings. This study aims to develop a pipeline for collecting, cleaning, and analyzing large-scale longitudinal social media data for disease monitoring, with a focus on COVID-19 pandemic.

Materials And Methods: This pipeline initiates by screening COVID-19 cases from tweets spanning February 1, 2020, to April 30, 2022.

View Article and Find Full Text PDF

Objective: The objectives of this study are to synthesize findings from recent research of retrieval-augmented generation (RAG) and large language models (LLMs) in biomedicine and provide clinical development guidelines to improve effectiveness.

Materials And Methods: We conducted a systematic literature review and a meta-analysis. The report was created in adherence to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses 2020 analysis.

View Article and Find Full Text PDF

Background And Objective: Despite significant investments in the normalization and the standardization of Electronic Health Records (EHRs), free text is still the rule rather than the exception in clinical notes. The use of free text has implications in data reuse methods used for supporting clinical research since the query mechanisms used in cohort definition and patient matching are mainly based on structured data and clinical terminologies. This study aims to develop a method for the secondary use of clinical text by: (a) using Natural Language Processing (NLP) for tagging clinical notes with biomedical terminology; and (b) designing an ontology that maps and classifies all the identified tags to various terminologies and allows for running phenotyping queries.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!