Multi-Ontology Refined Embeddings (MORE): A hybrid multi-ontology and corpus-based semantic representation model for biomedical concepts.

J Biomed Inform

Department of Computer Science, Dartmouth College, Hanover, NH 03755, USA; Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA; Department of Epidemiology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA. Electronic address:

Published: November 2020

Objective: Currently, a major limitation for natural language processing (NLP) analyses in clinical applications is that concepts are not effectively referenced in various forms across different texts. This paper introduces Multi-Ontology Refined Embeddings (MORE), a novel hybrid framework that incorporates domain knowledge from multiple ontologies into a distributional semantic model, learned from a corpus of clinical text.

Materials And Methods: We use the RadCore and MIMIC-III free-text datasets for the corpus-based component of MORE. For the ontology-based part, we use the Medical Subject Headings (MeSH) ontology and three state-of-the-art ontology-based similarity measures. In our approach, we propose a new learning objective, modified from the sigmoid cross-entropy objective function.

Results And Discussion: We used two established datasets of semantic similarities among biomedical concept pairs to evaluate the quality of the generated word embeddings. On the first dataset with 29 concept pairs, with similarity scores established by physicians and medical coders, MORE's similarity scores have the highest combined correlation (0.633), which is 5.0% higher than that of the baseline model, and 12.4% higher than that of the best ontology-based similarity measure. On the second dataset with 449 concept pairs, MORE's similarity scores have a correlation of 0.481, based on the average of four medical residents' similarity ratings, and that outperforms the skip-gram model by 8.1%, and the best ontology measure by 6.9%. Furthermore, MORE outperforms three pre-trained transformer-based word embedding models (i.e., BERT, ClinicalBERT, and BioBERT) on both datasets.

Conclusion: MORE incorporates knowledge from several biomedical ontologies into an existing corpus-based distributional semantics model, improving both the accuracy of the learned word embeddings and the extensibility of the model to a broader range of biomedical concepts. MORE allows for more accurate clustering of concepts across a wide range of applications, such as analyzing patient health records to identify subjects with similar pathologies, or integrating heterogeneous clinical data to improve interoperability between hospitals.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7665985PMC
http://dx.doi.org/10.1016/j.jbi.2020.103581DOI Listing

Publication Analysis

Top Keywords

concept pairs
12
similarity scores
12
multi-ontology refined
8
refined embeddings
8
biomedical concepts
8
ontology-based similarity
8
word embeddings
8
more's similarity
8
model
6
similarity
6

Similar Publications

Risk of bias assessment tools often addressed items not related to risk of bias and used numerical scores.

J Clin Epidemiol

January 2025

Division of Nephrology and Hypertension, Department of Internal Medicine, University of Kansas Medical Centre, 3901 Rainbow Blvd, MS3002, Kansas City, KS, USA. Electronic address:

Objective: We aimed to determine whether the existing risk of bias assessment tools addressed constructs other than risk of bias or internal validity, and whether they used numerical scores to express quality, which is discouraged and may be a misleading approach.

Methods: We searched Ovid MEDLINE and Embase to identify quality appraisal tools across all disciplines in human health research. Tools designed specifically to evaluate reporting quality were excluded.

View Article and Find Full Text PDF

Using genetic data to infer evolutionary distances between molecular sequence pairs based on a Markov substitution model is a common procedure in phylogenetics, in particular for selecting a good starting tree to improve upon. Many evolutionary patterns can be accurately modelled using substitution models that are available in closed form, including the popular general time reversible model (GTR) for DNA data. For more complex biological phenomena, such as variations in lineage-specific evolutionary rates over time (heterotachy), other approaches such as the GTR with rate variation (GTR ) are required, but do not admit analytical solutions and do not automatically allow for likelihood calculations crucial for Bayesian analysis.

View Article and Find Full Text PDF

Objective: The study aimed to evaluate student knowledge and perceptions regarding career options and knowledge of the pharmaceutical industry based on pre-post module quizzes, reflections and team presentations in a hybrid medical affairs elective certificate course.

Methods: A qualitative research design was utilized to analyze reflections from 19 students enrolled in the Accreditation Council for Medical Affairs (ACMA) Pharmaceutical Industry Training Certificate elective at Marshall University during Spring 2023. The course utilized seven modules from the Medical Affairs Competency Certificate (MACC) offered by ACMA.

View Article and Find Full Text PDF

The embodied approach to language meaning suggests that negation with action verbs decreases activation of the negated concept, reflected in reduced motor-evoked potentials (MEPs) induced by transcranial magnetic stimulation (TMS). This study aims to explore how action negation influences inhibitory and facilitatory mechanisms within the primary motor cortex (M1) using paired-pulse TMS (ppTMS). We evaluated corticospinal excitability (CSE), short intracortical inhibition (SICI), indexing GABAA activity, and intracortical facilitation (ICF), related to glutamatergic activity.

View Article and Find Full Text PDF

This study advances microfluidic probe (MFP) technology through the development of a 3D-printed Microfluidic Mixing Probe (MMP), which integrates a built-in pre-mixer network of channels and features a lined array of paired injection and aspiration apertures. By combining the concepts of hydrodynamic flow confinements (HFCs) and "Christmas-tree" concentration gradient generation, the MMP can produce multiple concentration-varying flow dipoles, ranging from 0 to 100%, within an open microfluidic environment. This innovation overcomes previous limitations of MFPs, which only produced homogeneous bioreagents, by utilizing the pre-mixer to create distinct concentration of injected biochemicals.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!