Corpus domain effects on distributional semantic modeling of medical terms.

Bioinformatics

Institute for Health Informatics, University of Minnesota, Minneapolis, MN 55455, USA.

Published: December 2016

Motivation: Automatically quantifying semantic similarity and relatedness between clinical terms is an important aspect of text mining from electronic health records, which are increasingly recognized as valuable sources of phenotypic information for clinical genomics and bioinformatics research. A key obstacle to development of semantic relatedness measures is the limited availability of large quantities of clinical text to researchers and developers outside of major medical centers. Text from general English and biomedical literature are freely available; however, their validity as a substitute for clinical domain to represent semantics of clinical terms remains to be demonstrated.

Results: We constructed neural network representations of clinical terms found in a publicly available benchmark dataset manually labeled for semantic similarity and relatedness. Similarity and relatedness measures computed from text corpora in three domains (Clinical Notes, PubMed Central articles and Wikipedia) were compared using the benchmark as reference. We found that measures computed from full text of biomedical articles in PubMed Central repository (rho = 0.62 for similarity and 0.58 for relatedness) are on par with measures computed from clinical reports (rho = 0.60 for similarity and 0.57 for relatedness). We also evaluated the use of neural network based relatedness measures for query expansion in a clinical document retrieval task and a biomedical term word sense disambiguation task. We found that, with some limitations, biomedical articles may be used in lieu of clinical reports to represent the semantics of clinical terms and that distributional semantic methods are useful for clinical and biomedical natural language processing applications.

Availability And Implementation: The software and reference standards used in this study to evaluate semantic similarity and relatedness measures are publicly available as detailed in the article.

Contact: pakh0002@umn.eduSupplementary information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5181540PMC
http://dx.doi.org/10.1093/bioinformatics/btw529DOI Listing

Publication Analysis

Top Keywords

similarity relatedness
16
clinical terms
16
relatedness measures
16
semantic similarity
12
clinical
12
measures computed
12
distributional semantic
8
relatedness
8
represent semantics
8
semantics clinical
8

Similar Publications

sp. nov., isolated from the intestines of .

Int J Syst Evol Microbiol

January 2025

College of Life Science, Shenyang Normal University, Shenyang 110000, PR China.

A Gram-stain-negative, aerobic, motile, catalase-positive, oxidase-positive, short rod-shaped marine bacterium, designated as YIC-827, was isolated from Qingdao, Shandong Province, China. The results showed that cells of strain YIC-827 could grow optimally at 25-35 °C, pH 6.5-7.

View Article and Find Full Text PDF

Bacteria associated with canine pyometra and concurrent bacteriuria: A prospective study.

Vet Microbiol

December 2024

Veterinary Teaching Hospital, Faculty of Veterinary Medicine, University of Helsinki, P.O. Box 57 (Viikintie 49), Helsinki FI-00014, Finland.

Canine pyometra is a common and potentially life-threatening reproductive disorder in intact female dogs. This prospective study aimed to (1) investigate the bacterial spectrum and antimicrobial susceptibilities of bacterial isolates from the uterus and urine of dogs with pyometra, (2) assess the clonal relatedness and virulence factors of Escherichia coli isolates from individual dogs, and (3) determine the occurrence of concurrent and persistent bacteriuria or clinical urinary tract infections. Bacterial isolates from 208 uterine and 203 urine specimens collected during pyometra surgery were analyzed.

View Article and Find Full Text PDF

Background: Older veterans with multimorbidity experience physical and social vulnerabilities that complicate receipt of and adherence to physical rehabilitation services. Thus, traditional physical rehabilitation programs are insufficient to address this population's heterogenous clinical presentation.

Objective: To evaluate the feasibility and acceptability of a MultiComponent TeleRehabilitation (MCTR) program for older veterans with multimorbidity.

View Article and Find Full Text PDF

The clinical diagnosis of dermatophytosis and identification of dermatophytes face challenges due to reliance on culture-based methods. Rapid, cost-effective detection techniques for volatile organic compounds (VOCs) have been developed for other microorganisms, but their application to dermatophytes is limited. This study explores using VOCs as diagnostic markers for dermatophytes.

View Article and Find Full Text PDF

SURFINs protein family expressed on surface of both infected red blood cell and merozoite surface making them as interesting vaccine candidate for erythrocytic stage of malaria infection. In this study, we analyze genetic variation of Pfsurf4.1 gene, copy number variation, and frequency of SURFIN4.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!