MedLexSp - a medical lexicon for Spanish medical natural language processing.

J Biomed Semantics

Instituto de Lengua, Literatura y Antropología (ILLA), CSIC (Spanish National Research Council), Albasanz 26-28, 28037, Madrid, Spain.

Published: February 2023

Background: Medical lexicons enable the natural language processing (NLP) of health texts. Lexicons gather terms and concepts from thesauri and ontologies, and linguistic data for part-of-speech (PoS) tagging, lemmatization or natural language generation. To date, there is no such type of resource for Spanish.

Construction And Content: This article describes an unified medical lexicon for Medical Natural Language Processing in Spanish. MedLexSp includes terms and inflected word forms with PoS information and Unified Medical Language System[Formula: see text] (UMLS) semantic types, groups and Concept Unique Identifiers (CUIs). To create it, we used NLP techniques and domain corpora (e.g. MedlinePlus). We also collected terms from the Dictionary of Medical Terms from the Spanish Royal Academy of Medicine, the Medical Subject Headings (MeSH), the Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT), the Medical Dictionary for Regulatory Activities Terminology (MedDRA), the International Classification of Diseases vs. 10, the Anatomical Therapeutic Chemical Classification, the National Cancer Institute (NCI) Dictionary, the Online Mendelian Inheritance in Man (OMIM) and OrphaData. Terms related to COVID-19 were assembled by applying a similarity-based approach with word embeddings trained on a large corpus. MedLexSp includes 100 887 lemmas, 302 543 inflected forms (conjugated verbs, and number/gender variants), and 42 958 UMLS CUIs. We report two use cases of MedLexSp. First, applying the lexicon to pre-annotate a corpus of 1200 texts related to clinical trials. Second, PoS tagging and lemmatizing texts about clinical cases. MedLexSp improved the scores for PoS tagging and lemmatization compared to the default Spacy and Stanza python libraries.

Conclusions: The lexicon is distributed in a delimiter-separated value file; an XML file with the Lexical Markup Framework; a lemmatizer module for the Spacy and Stanza libraries; and complementary Lexical Record (LR) files. The embeddings and code to extract COVID-19 terms, and the Spacy and Stanza lemmatizers enriched with medical terms are provided in a public repository.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9892682PMC
http://dx.doi.org/10.1186/s13326-022-00281-5DOI Listing

Publication Analysis

Top Keywords

natural language
16
language processing
12
pos tagging
12
spacy stanza
12
medical
9
medical lexicon
8
medical natural
8
terms
8
tagging lemmatization
8
unified medical
8

Similar Publications

Effect of initial bone morphology on alveolar bone remodeling following molar extraction: A retrospective study.

J Periodontol

January 2025

Stomatology Hospital, School of Stomatology, Zhejiang University School of Medicine, Zhejiang Provincial Clinical Research Center for Oral Diseases, Key Laboratory of Oral Biomedical Research of Zhejiang Province, Cancer Center of Zhejiang University, Hangzhou, China.

Background: The clinical evidence about alveolar ridge changes following molar extraction and how the alveolar bone morphology influences the ridge dimensional changes remains limited.

Methods: A total of 192 patients with 199 molar extractions were included in this retrospective study. Cone-beam computed tomography (CBCT) images of patients were obtained 0-3 months pre extraction and 6-12 months post extraction.

View Article and Find Full Text PDF

Natural language processing-based classification of early Alzheimer's disease from connected speech.

Alzheimers Dement

January 2025

Laboratory for Cognitive Neurology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Leuven, Belgium.

Introduction: The automated analysis of connected speech using natural language processing (NLP) emerges as a possible biomarker for Alzheimer's disease (AD). However, it remains unclear which types of connected speech are most sensitive and specific for the detection of AD.

Methods: We applied a language model to automatically transcribed connected speech from 114 Flemish-speaking individuals to first distinguish early AD patients from amyloid negative cognitively unimpaired (CU) and then amyloid negative from amyloid positive CU individuals using five different types of connected speech.

View Article and Find Full Text PDF

Background: The spinal column is a frequent site for metastases, affecting over 30% of solid tumor patients. Identifying the primary tumor is essential for guiding clinical decisions but often requires resource-intensive diagnostics.

Purpose: To develop and validate artificial intelligence (AI) models using noncontrast MRI to identify primary sites of spinal metastases, aiming to enhance diagnostic efficiency.

View Article and Find Full Text PDF

Background: In Chinese intervention studies, the lack of specific self-care scales based on the functional characteristics of Rheumatoid arthritis (RA) patients has caused patients and researchers to spend a great deal of time completing multiple related scales during the research work. Therefore, the arthritis Self-Care Behaviors Scale (SCBS) was developed to evaluate the self-care behavior of patients with arthritis.

Objective: The objectives of this study were to translate the SCBS into Chinese and test its psychometric properties in Chinese patients with RA.

View Article and Find Full Text PDF

Natural products have long been a rich source of diverse and clinically effective drug candidates. Non-ribosomal peptides (NRPs), polyketides (PKs), and NRP-PK hybrids are three classes of natural products that display a broad range of bioactivities, including antibiotic, antifungal, anticancer, and immunosuppressant activities. However, discovering these compounds through traditional bioactivity-guided techniques is costly and time-consuming, often resulting in the rediscovery of known molecules.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!