Detecting missing IS-A relations in the NCI Thesaurus using an enhanced hybrid approach.

Fengbo Zheng Rashmie Abeysinghe Nicholas Sioutos Lori Whiteman Lyubov Remennik Licong Cui

BMC Med Inform Decis Mak

School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.

Published: December 2020

Background: The National Cancer Institute (NCI) Thesaurus provides reference terminology for NCI and other systems. Previously, we proposed a hybrid prototype utilizing lexical features and role definitions of concepts in non-lattice subgraphs to identify missing IS-A relations in the NCI Thesaurus. However, no domain expert evaluation was provided in our previous work. In this paper, we further enhance the hybrid approach by leveraging a novel lexical feature-roots of noun chunks within concept names. Formal evaluation of our enhanced approach is also performed.

Method: We first compute all the non-lattice subgraphs in the NCI Thesaurus. We model each concept using its role definitions, words and roots of noun chunks within its concept name and its ancestor's names. Then we perform subsumption testing for candidate concept pairs in the non-lattice subgraphs to automatically detect potentially missing IS-A relations. Domain experts evaluated the validity of these relations.

Results: We applied our approach to 19.08d version of the NCI Thesaurus. A total of 55 potentially missing IS-A relations were identified by our approach and reviewed by domain experts. 29 out of 55 were confirmed as valid by domain experts and have been incorporated in the newer versions of the NCI Thesaurus. 7 out of 55 further revealed incorrect existing IS-A relations in the NCI Thesaurus.

Conclusions: The results showed that our hybrid approach by leveraging lexical features and role definitions is effective in identifying potentially missing IS-A relations in the NCI Thesaurus.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7737275	PMC
http://dx.doi.org/10.1186/s12911-020-01289-6	DOI Listing

Publication Analysis

Top Keywords

nci thesaurus

is-a relations

missing is-a

relations nci

hybrid approach

role definitions

non-lattice subgraphs

domain experts

nci

lexical features

Similar Publications

Evaluation of large language models for discovery of gene set function.

Nat Methods

January 2025

Department of Medicine, University of California San Diego, La Jolla, CA, USA.

Mengzhou Hu Sahar Alkhairy Ingoo Lee Rudolf T Pillich Dylan Fong

Gene set enrichment is a mainstay of functional genomics, but it relies on gene function databases that are incomplete. Here we evaluate five large language models (LLMs) for their ability to discover the common functions represented by a gene set, supported by molecular rationale and a self-confidence assessment. For curated gene sets from Gene Ontology, GPT-4 suggests functions similar to the curated name in 73% of cases, with higher self-confidence predicting higher similarity.

View Article and Find Full Text PDF

Similar Publications

Identifying Patients With Primary Biliary Cholangitis and Cirrhosis Using Administrative Data in a National Cohort.

Pharmacoepidemiol Drug Saf

October 2024

Department of Health Behavior and Policy, Virginia Commonwealth University, Richmond, Virginia, USA.

Binu V John Dustin Bastaich Bassam Dahman

Background: The accuracy of administrative codes to capture patients with both primary biliary cholangitis (PBC) and cirrhosis could be challenging because of the potential for incorrect coding due to the old nomenclature "Primary Biliary Cirrhosis." Therefore, the aim of this study was to examine the positive predictive value (PPV) of International Classification of Diseases (ICD) codes for PBC and cirrhosis.

Methods: This was a retrospective cohort study using data from the VA Corporate Data Warehouse.

View Article and Find Full Text PDF

Similar Publications

Cancer of the Larynx-20-Year Comparative Survival and Mortality Analysis by Age, Sex, Race, Stage, Grade, Cohort Entry Time-Period, Disease Duration and ICD-O-3 Topographic Primary Sites-Codes C32.0-9: A Systematic Review of 43,103 Cases for Diagnosis Years 1975-2017: (NCI SEER*Stat 8.3.9).

J Insur Med

July 2024

Anthony F Milano

Background: .-Laryngeal malignancy, "voice box" cancer, is uncommon with 12,620 estimated new cases and 3770 deaths in the United States in 2021,1 and represents only 6.2% of all respiratory system malignancies.

View Article and Find Full Text PDF

Similar Publications

Depression prevalence of the Geriatric Depression Scale-15 was compared to Structured Clinical Interview for DSM using individual participant data meta-analysis.

Sci Rep

July 2024

Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada.

Marc Parsons Lu Qiu Brooke Levis Suiqiong Fan Ying Sun

Article Synopsis

The Geriatric Depression Scale (GDS-15), commonly used to gauge depression in older adults, shows that a score of ≥5 identifies higher prevalence (34.2%) compared to the Structured Clinical Interview (SCID) which shows a lower prevalence (14.8%).
An analysis of data from 14 studies involving over 3,600 participants found that using GDS-15 with a cutoff of ≥8 aligns much closer to SCID results, with only a minor difference (-0.3%).
While GDS-15 ≥5 greatly overestimates depression prevalence, the suggested cutoff of ≥8 might be more accurate but has too much variation to be reliably implemented; hence, validated diagnostic

View Article and Find Full Text PDF

Similar Publications

CCPA: cloud-based, self-learning modules for consensus pathway analysis using GO, KEGG and Reactome.

Brief Bioinform

July 2024

Department of Computer Science and Software Engineering, Auburn University, AL 36849, USA.

Ha Nguyen Van-Dung Pham Hung Nguyen Bang Tran Juli Petereit

This manuscript describes the development of a resource module that is part of a learning platform named 'NIGMS Sandbox for Cloud-based Learning' (https://github.com/NIGMS/NIGMS-Sandbox). The module delivers learning materials on Cloud-based Consensus Pathway Analysis in an interactive format that uses appropriate cloud resources for data access and analyses.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!