Advances in gene ontology utilization improve statistical power of annotation enrichment.

PLoS One

Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, KY, United States of America.

Published: March 2020

AI Article Synopsis

  • Gene-annotation enrichment uses ontology-based annotations in gene knowledgebases to enhance understanding of gene functions, but some relationships in these ontologies can lead to errors if not managed properly.
  • To address these challenges, the Gene Ontology Categorization Suite (GOcats) was developed to organize gene ontology into user-defined subgraphs while ensuring the correctness of semantic relations.
  • GOcats demonstrated significant improvements in annotation enrichment in analyses of breast cancer and horse cartilage development datasets, revealing new biologically relevant terms that were missed by traditional methods.

Article Abstract

Gene-annotation enrichment is a common method for utilizing ontology-based annotations in gene and gene-product centric knowledgebases. Effective utilization of these annotations requires inferring semantic linkages by tracing paths through edges in the ontological graph, referred to as relations. However, some relations are semantically problematic with respect to scope, necessitating their omission or modification lest erroneous term mappings occur. To address these issues, we created the Gene Ontology Categorization Suite, or GOcats-a novel tool that organizes the Gene Ontology into subgraphs representing user-defined concepts, while ensuring that all appropriate relations are congruent with respect to scoping semantics. Here, we demonstrate the improvements in annotation enrichment by re-interpreting edges that would otherwise be omitted by traditional ancestor path-tracing methods. Specifically, we show that GOcats' unique handling of relations improves enrichment over conventional methods in the analysis of two different gene-expression datasets: a breast cancer microarray dataset and several horse cartilage development RNAseq datasets. With the breast cancer microarray dataset, we observed significant improvement (one-sided binomial test p-value = 1.86E-25) in 182 of 217 significantly enriched GO terms identified from the conventional path traversal method when GOcats' path traversal was used. We also found new significantly enriched terms using GOcats, whose biological relevancy has been experimentally demonstrated elsewhere. Likewise, on the horse RNAseq datasets, we observed a significant improvement in GO term enrichment when using GOcat's path traversal: one-sided binomial test p-values range from 1.32E-03 to 2.58E-44.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6695228PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0220728PLOS

Publication Analysis

Top Keywords

gene ontology
12
path traversal
12
annotation enrichment
8
datasets breast
8
breast cancer
8
cancer microarray
8
microarray dataset
8
rnaseq datasets
8
observed improvement
8
one-sided binomial
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!