Automatic gene annotation using GO terms from cellular component domain.

BMC Med Inform Decis Mak

Department of Computer and Information Science, University of Delaware, Newark, DE, 19716, USA.

Published: December 2018

Background: The Gene Ontology (GO) is a resource that supplies information about gene product function using ontologies to represent biological knowledge. These ontologies cover three domains: Cellular Component (CC), Molecular Function (MF), and Biological Process (BP). GO annotation is a process which assigns gene functional information using GO terms to relevant genes in the literature. It is a common task among the Model Organism Database (MOD) groups. Manual GO annotation relies on human curators assigning gene functional information using GO terms by reading the biomedical literature. This process is very time-consuming and labor-intensive. As a result, many MODs can afford to curate only a fraction of relevant articles.

Methods: GO terms from the CC domain can be essentially divided into two sub-hierarchies: subcellular location terms, and protein complex terms. We cast the task of gene annotation using GO terms from the CC domain as relation extraction between gene and other entities: (1) extract cases where a protein is found to be in a subcellular location, and (2) extract cases where a protein is a subunit of a protein complex. For each relation extraction task, we use an approach based on triggers and syntactic dependencies to extract the desired relations among entities.

Results: We tested our approach on the BC4GO test set, a publicly available corpus for GO annotation. Our approach obtains a F1-score of 71%, a precision of 91% and a recall of 58% for predicting GO terms from CC Domain for given genes.

Conclusions: We have described a novel approach of treating gene annotation with GO terms from CC domain as two relation extraction subtasks. Evaluation results show that our approach achieves a F1-score of 71% for predicting GO terms for given genes. Thereby our approach can be used to accelerate the process of GO annotation for the bio-annotators.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6284271PMC
http://dx.doi.org/10.1186/s12911-018-0694-7DOI Listing

Publication Analysis

Top Keywords

terms domain
16
gene annotation
12
annotation terms
12
relation extraction
12
terms
10
cellular component
8
process annotation
8
gene functional
8
functional terms
8
subcellular location
8

Similar Publications

Triazole, a nitrogen-containing five-membered heterocycle with two isomeric forms, 1,2,3-triazole and 1,2,4-triazole, has proven to be a valuable component in the pharmaceutical domain. Owing to its widespread utility in drug development, pharmaceutical and medicinal chemistry, several synthetic methods have been explored, such as different catalytic systems, solvents, and heating methodologies in recent years. However, some methods were associated with several limitations, such as harsh reaction conditions, high temperatures, low atom economy, and long reaction times.

View Article and Find Full Text PDF

Background: There is no definition of what constitutes a lie when working with people with moderate to severe dementia. Lies are often defined as therapeutic with no evidence of how therapeutic value is gauged. There is no previous research that observes lies being told or the impact the lies have on people with dementia.

View Article and Find Full Text PDF

Background: Malnutrition is common with esophagogastric cancers and is associated with negative outcomes. We aimed to evaluate if immunonutrition during neoadjuvant treatment improves patient's health-related quality of life (HRQOL) and reduces postoperative morbidity and toxicities during neoadjuvant treatment.

Methods: A multicenter double-blind randomized controlled trial (RCT) was undertaken.

View Article and Find Full Text PDF

Clinical risk prediction models are ubiquitous in many surgical domains. The traditional approach to develop these models involves the use of regression analysis. Machine learning algorithms are gaining in popularity as an alternative approach for prediction and classification problems.

View Article and Find Full Text PDF

Purpose: Female Genital Mutilation/Cutting (FGM/C) is a surgical intervention that is still performed in large numbers worldwide and has severe effects in terms of both obstetric and sexual consequences. Due to the increase in immigration, it has become more frequent in many countries. This study aims to compare the labor performance, complications, and postpartum sexual function of Type 3 Female Genital Mutilation/Cutting (FGM/C) pregnant women undergoing deinfibulation with Type 3 FGM/C patients without deinfibulation.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!