Accurate orthology prediction is crucial for many applications in the post-genomic era. The lack of broadly accepted benchmark tests precludes a comprehensive analysis of orthology inference. So far, functional annotation between orthologs serves as a performance proxy. However, this violates the fundamental principle of orthology as an evolutionary definition, while it is often not applicable due to limited experimental evidence for most species. Therefore, we constructed high quality "gold standard" orthologous groups that can serve as a benchmark set for orthology inference in bacterial species. Herein, we used this dataset to demonstrate 1) why a manually curated, phylogeny-based dataset is more appropriate for benchmarking orthology than other popular practices and 2) how it guides database design and parameterization through careful error quantification. More specifically, we illustrate how function-based tests often fail to identify false assignments, misjudging the true performance of orthology inference methods. We also examined how our dataset can instruct the selection of a "core" species repertoire to improve detection accuracy. We conclude that including more genomes at the proper evolutionary distances can influence the overall quality of orthology detection. The curated gene families, called Reference Orthologous Groups, are publicly available at http://eggnog.embl.de/orthobench2.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4219706PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0111122PLOS

Publication Analysis

Top Keywords

orthology inference
16
orthology
8
orthologous groups
8
phylogeny-based benchmarking
4
benchmarking test
4
test orthology
4
inference
4
inference reveals
4
reveals limitations
4
limitations function-based
4

Similar Publications

The surge in genome data, with ongoing efforts aiming to sequence 1.5 M eukaryotes in a decade, could revolutionize genomics, revealing the origins, evolution and genetic innovations of biological processes. Yet, traditional genomics methods scale poorly with such large datasets.

View Article and Find Full Text PDF

Background: The application of '-omics' technologies to study bacterial vaginosis (BV) has uncovered vast differences in composition and scale between the vaginal microbiomes of healthy and BV patients. Compared to amplicon sequencing and shotgun metagenomic approaches focusing on a single or few species, investigating the transcriptome of the vaginal microbiome at a system-wide level can provide insight into the functions which are actively expressed and differential between states of health and disease.

Results: We conducted a meta-analysis of vaginal metatranscriptomes from three studies, split into exploratory (n = 42) and validation (n = 297) datasets, accounting for the compositional nature of sequencing data and differences in scale between healthy and BV microbiomes.

View Article and Find Full Text PDF

CelEst: a unified gene regulatory network for estimating transcription factor activities in C. elegans.

Genetics

December 2024

Instituto de Biología Molecular de Barcelona (IBMB), CSIC, Parc Científic de Barcelona, C. Baldiri Reixac, 4-8, 08028 Barcelona, Spain.

Transcription factors (TFs) play a pivotal role in orchestrating critical intricate patterns of gene regulation. Although gene expression is complex, differential expression of hundreds of genes is often due to regulation by just a handful of TFs. Despite extensive efforts to elucidate TF-target regulatory relationships in Caenorhabditis elegans, existing experimental datasets cover distinct subsets of TFs and leave data integration challenging.

View Article and Find Full Text PDF

Motivation: Gene trees often differ from the species trees that contain them due to various factors, including incomplete lineage sorting (ILS) and gene duplication and loss (GDL). Several highly accurate species tree estimation methods have been introduced to explicitly address ILS, including ASTRAL, a widely used statistically consistent method, and wQFM, a quartet amalgamation approach experimentally shown to be more accurate than ASTRAL. Two recent advancements, ASTRAL-Pro and DISCO, have emerged in phylogenomics to consider GDL.

View Article and Find Full Text PDF

New developments for the Quest for Orthologs benchmark service.

NAR Genom Bioinform

December 2024

Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden.

The Quest for Orthologs (QfO) orthology benchmark service (https://orthology.benchmarkservice.org) hosts a wide range of standardized benchmarks for orthology inference evaluation.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!