Background: Medical lexicons enable the natural language processing (NLP) of health texts. Lexicons gather terms and concepts from thesauri and ontologies, and linguistic data for part-of-speech (PoS) tagging, lemmatization or natural language generation. To date, there is no such type of resource for Spanish.
Construction And Content: This article describes an unified medical lexicon for Medical Natural Language Processing in Spanish. MedLexSp includes terms and inflected word forms with PoS information and Unified Medical Language System[Formula: see text] (UMLS) semantic types, groups and Concept Unique Identifiers (CUIs). To create it, we used NLP techniques and domain corpora (e.g. MedlinePlus). We also collected terms from the Dictionary of Medical Terms from the Spanish Royal Academy of Medicine, the Medical Subject Headings (MeSH), the Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT), the Medical Dictionary for Regulatory Activities Terminology (MedDRA), the International Classification of Diseases vs. 10, the Anatomical Therapeutic Chemical Classification, the National Cancer Institute (NCI) Dictionary, the Online Mendelian Inheritance in Man (OMIM) and OrphaData. Terms related to COVID-19 were assembled by applying a similarity-based approach with word embeddings trained on a large corpus. MedLexSp includes 100 887 lemmas, 302 543 inflected forms (conjugated verbs, and number/gender variants), and 42 958 UMLS CUIs. We report two use cases of MedLexSp. First, applying the lexicon to pre-annotate a corpus of 1200 texts related to clinical trials. Second, PoS tagging and lemmatizing texts about clinical cases. MedLexSp improved the scores for PoS tagging and lemmatization compared to the default Spacy and Stanza python libraries.
Conclusions: The lexicon is distributed in a delimiter-separated value file; an XML file with the Lexical Markup Framework; a lemmatizer module for the Spacy and Stanza libraries; and complementary Lexical Record (LR) files. The embeddings and code to extract COVID-19 terms, and the Spacy and Stanza lemmatizers enriched with medical terms are provided in a public repository.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9892682 | PMC |
http://dx.doi.org/10.1186/s13326-022-00281-5 | DOI Listing |
J Periodontol
January 2025
Stomatology Hospital, School of Stomatology, Zhejiang University School of Medicine, Zhejiang Provincial Clinical Research Center for Oral Diseases, Key Laboratory of Oral Biomedical Research of Zhejiang Province, Cancer Center of Zhejiang University, Hangzhou, China.
Background: The clinical evidence about alveolar ridge changes following molar extraction and how the alveolar bone morphology influences the ridge dimensional changes remains limited.
Methods: A total of 192 patients with 199 molar extractions were included in this retrospective study. Cone-beam computed tomography (CBCT) images of patients were obtained 0-3 months pre extraction and 6-12 months post extraction.
Alzheimers Dement
January 2025
Laboratory for Cognitive Neurology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Leuven, Belgium.
Introduction: The automated analysis of connected speech using natural language processing (NLP) emerges as a possible biomarker for Alzheimer's disease (AD). However, it remains unclear which types of connected speech are most sensitive and specific for the detection of AD.
Methods: We applied a language model to automatically transcribed connected speech from 114 Flemish-speaking individuals to first distinguish early AD patients from amyloid negative cognitively unimpaired (CU) and then amyloid negative from amyloid positive CU individuals using five different types of connected speech.
J Magn Reson Imaging
January 2025
Department of Radiology, Peking University Third Hospital, Beijing, China.
Background: The spinal column is a frequent site for metastases, affecting over 30% of solid tumor patients. Identifying the primary tumor is essential for guiding clinical decisions but often requires resource-intensive diagnostics.
Purpose: To develop and validate artificial intelligence (AI) models using noncontrast MRI to identify primary sites of spinal metastases, aiming to enhance diagnostic efficiency.
Int J Rheum Dis
January 2025
School of Nursing, Peking University, Beijing, People's Republic of China.
Background: In Chinese intervention studies, the lack of specific self-care scales based on the functional characteristics of Rheumatoid arthritis (RA) patients has caused patients and researchers to spend a great deal of time completing multiple related scales during the research work. Therefore, the arthritis Self-Care Behaviors Scale (SCBS) was developed to evaluate the self-care behavior of patients with arthritis.
Objective: The objectives of this study were to translate the SCBS into Chinese and test its psychometric properties in Chinese patients with RA.
Natural products have long been a rich source of diverse and clinically effective drug candidates. Non-ribosomal peptides (NRPs), polyketides (PKs), and NRP-PK hybrids are three classes of natural products that display a broad range of bioactivities, including antibiotic, antifungal, anticancer, and immunosuppressant activities. However, discovering these compounds through traditional bioactivity-guided techniques is costly and time-consuming, often resulting in the rediscovery of known molecules.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!