AI Article Synopsis

  • Genomes and transcriptomes are often difficult to analyze; identifying orthologs (corresponding genes across species) is a critical but challenging step in this process.
  • The Orthologous MAtrix (OMA) database is a key resource for finding orthologs, and the OMA pipeline can be run as a standalone program on Linux and Mac, supporting various job schedulers and scaling up for large data processing.
  • OMA standalone allows users to integrate their own data with public genomic data and offers applications like phylogenetic analysis and identifying gene family changes or potential drug targets, and is available as open-source software.

Article Abstract

Genomes and transcriptomes are now typically sequenced by individual laboratories but analyzing them often remains challenging. One essential step in many analyses lies in identifying orthologs-corresponding genes across multiple species-but this is far from trivial. The Orthologous MAtrix (OMA) database is a leading resource for identifying orthologs among publicly available, complete genomes. Here, we describe the OMA pipeline available as a standalone program for Linux and Mac. When run on a cluster, it has native support for the LSF, SGE, PBS Pro, and Slurm job schedulers and can scale up to thousands of parallel processes. Another key feature of OMA standalone is that users can combine their own data with existing public data by exporting genomes and precomputed alignments from the OMA database, which currently contains over 2100 complete genomes. We compare OMA standalone to other methods in the context of phylogenetic tree inference, by inferring a phylogeny of Lophotrochozoa, a challenging clade within the protostomes. We also discuss other potential applications of OMA standalone, including identifying gene families having undergone duplications/losses in specific clades, and identifying potential drug targets in nonmodel organisms. OMA standalone is available under the permissive open source Mozilla Public License Version 2.0.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6633268PMC
http://dx.doi.org/10.1101/gr.243212.118DOI Listing

Publication Analysis

Top Keywords

oma standalone
20
oma
8
genomes transcriptomes
8
oma database
8
complete genomes
8
genomes
5
standalone
5
standalone orthology
4
orthology inference
4
inference public
4

Similar Publications

Knowledge of species phylogeny is critical to many fields of biology. In an era of genome data availability, the most common way to make a phylogenetic species tree is by using multiple protein-coding genes, conserved in multiple species. This methodology is composed of several steps: orthology inference, multiple sequence alignment and inference of the phylogeny with dedicated tools.

View Article and Find Full Text PDF
Article Synopsis
  • Genomes and transcriptomes are often difficult to analyze; identifying orthologs (corresponding genes across species) is a critical but challenging step in this process.
  • The Orthologous MAtrix (OMA) database is a key resource for finding orthologs, and the OMA pipeline can be run as a standalone program on Linux and Mac, supporting various job schedulers and scaling up for large data processing.
  • OMA standalone allows users to integrate their own data with public genomic data and offers applications like phylogenetic analysis and identifying gene family changes or potential drug targets, and is available as open-source software.
View Article and Find Full Text PDF

Motivation: Accurate orthology inference is a fundamental step in many phylogenetics and comparative analysis. Many methods have been proposed, including OMA (Orthologous MAtrix). Yet substantial challenges remain, in particular in coping with fragmented genes or genes evolving at different rates after duplication, and in scaling to large datasets.

View Article and Find Full Text PDF

Orthology inference and other sequence analyses across multiple genomes typically start by performing exhaustive pairwise sequence comparisons, a process referred to as "all-against-all". As this process scales quadratically in terms of the number of sequences analysed, this step can become a bottleneck, thus limiting the number of genomes that can be simultaneously analysed. Here, we explored ways of speeding-up the all-against-all step while maintaining its sensitivity.

View Article and Find Full Text PDF
Article Synopsis
  • The study focuses on evaluating the accuracy of six software methods for identifying orthologous genes using simulated data, which provides more control compared to real data.
  • It explores how different evolutionary processes (like gene duplication and lateral gene transfer) and technological issues (like ambiguous sequences) impact the performance of these orthology inference methods.
  • The findings reveal that while most methods handle gene duplication/loss well, lateral gene transfer disrupts all methods, and ambiguous sequences notably hinder alignment score-based methods more than distance-based methods.
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!