Identifying Human Phenotype Terms by Combining Machine Learning and Validation Rules.

Biomed Res Int

LaSIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal.

Published: January 2018

Named-Entity Recognition is commonly used to identify biological entities such as proteins, genes, and chemical compounds found in scientific articles. The Human Phenotype Ontology (HPO) is an ontology that provides a standardized vocabulary for phenotypic abnormalities found in human diseases. This article presents the Identifying Human Phenotypes (IHP) system, tuned to recognize HPO entities in unstructured text. IHP uses Stanford CoreNLP for text processing and applies Conditional Random Fields trained with a rich feature set, which includes linguistic, orthographic, morphologic, lexical, and context features created for the machine learning-based classifier. However, the main novelty of IHP is its validation step based on a set of carefully crafted manual rules, such as the negative connotation analysis, that combined with a dictionary can filter incorrectly identified entities, find missed entities, and combine adjacent entities. The performance of IHP was evaluated using the recently published HPO Gold Standardized Corpora (GSC), where the system Bio-LarK CR obtained the best -measure of 0.56. IHP achieved an -measure of 0.65 on the GSC. Due to inconsistencies found in the GSC, an extended version of the GSC was created, adding 881 entities and modifying 4 entities. IHP achieved an -measure of 0.863 on the new GSC.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5700471PMC
http://dx.doi.org/10.1155/2017/8565739DOI Listing

Publication Analysis

Top Keywords

identifying human
8
human phenotype
8
ihp achieved
8
achieved -measure
8
entities
7
ihp
6
gsc
5
phenotype terms
4
terms combining
4
combining machine
4

Similar Publications

Introduction: Dolutegravir (DTG) + lamivudine (3TC) demonstrated high rates of virologic suppression (VS) and low rates of virologic failure (VF), discontinuation, and drug resistance in randomized trials. Real-world evidence can support treatment effectiveness, safety, and tolerability in clinical practice and aid in treatment decisions.

Methods: A systematic literature review (SLR) was conducted to identify studies using DTG + 3TC (January 2013-March 2024).

View Article and Find Full Text PDF

Background & Aims: Hepatic encephalopathy (HE), one of the most serious prognostic factors for mortality in alcohol-related cirrhosis (ALD cirrhosis), is not recorded in Danish healthcare registries. However, treatment of HE with lactulose, the universal first-line treatment, can be identified through data on filled prescriptions. This study aimed to investigate if lactulose can be used as a surrogate marker of HE.

View Article and Find Full Text PDF

Introduction: Colorectal cancer (CRC) is the second most common cause of cancer-related deaths globally. The gut microbiota, along with adenomatous polyps (AP), has emerged as a plausible contributor to CRC progression. This study aimed to scrutinize the impact of the FadA antigen derived from Fusobacterium nucleatum on the expression levels of the ANXA2 ceRNA network and assess its relevance to CRC advancement.

View Article and Find Full Text PDF

Objective: Rheumatoid arthritis (RA) is an autoimmune condition that causes severe joint deformities and impaired functionality, affecting the well-being and daily life of individuals. Consequently, there is a pressing demand for identifying viable therapeutic targets for treating RA. This study aimed to explore the molecular mechanisms of osteoclast differentiation in PBMC from patients with RA through transcriptome sequencing and bioinformatics analysis.

View Article and Find Full Text PDF

Teaching Spirituality in Nursing: A Bibliometric Analysis.

J Relig Health

January 2025

Faculty of Health Sciences and Nursing, Center for Interdisciplinary Research in Health, Universidade Católica Portuguesa, Palma de Cima, 1649-023, Lisbon, Portugal.

The study of spirituality in nursing education has become an emerging academic field, making it important to understand its evolution using bibliometric indicators. To achieve this, a search was conducted on July 8, 2024, using the Web of Science and Scopus databases. Titles and abstracts were screened in Rayyan, and data analysis was performed using Bibliometrix and Biblioshiny in the R language.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!