A Bayesian Supertree Model for Genome-Wide Species Tree Reconstruction.

Syst Biol

Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, 36310, Spain.

Published: May 2016

AI Article Synopsis

  • This study emphasizes the need for species tree inference methods that can handle various sources of discrepancy between gene trees and species trees, such as gene duplication, incomplete lineage sorting (ILS), and horizontal gene transfer.
  • The authors propose a hierarchical Bayesian model that extends Maximum Likelihood supertrees, allowing for modular consideration of different sources of discordance without requiring ortholog identification or limiting analyses to single individuals per species.
  • Their new software, guenomu, has shown superior performance in simulations and empirical data, yielding better gene tree estimates and being efficient enough for large datasets, while also demonstrating that simpler approaches like gene tree parsimony can be competitive in terms of speed.

Article Abstract

Current phylogenomic data sets highlight the need for species tree methods able to deal with several sources of gene tree/species tree incongruence. At the same time, we need to make most use of all available data. Most species tree methods deal with single processes of phylogenetic discordance, namely, gene duplication and loss, incomplete lineage sorting (ILS) or horizontal gene transfer. In this manuscript, we address the problem of species tree inference from multilocus, genome-wide data sets regardless of the presence of gene duplication and loss and ILS therefore without the need to identify orthologs or to use a single individual per species. We do this by extending the idea of Maximum Likelihood (ML) supertrees to a hierarchical Bayesian model where several sources of gene tree/species tree disagreement can be accounted for in a modular manner. We implemented this model in a computer program called guenomu whose inputs are posterior distributions of unrooted gene tree topologies for multiple gene families, and whose output is the posterior distribution of rooted species tree topologies. We conducted extensive simulations to evaluate the performance of our approach in comparison with other species tree approaches able to deal with more than one leaf from the same species. Our method ranked best under simulated data sets, in spite of ignoring branch lengths, and performed well on empirical data, as well as being fast enough to analyze relatively large data sets. Our Bayesian supertree method was also very successful in obtaining better estimates of gene trees, by reducing the uncertainty in their distributions. In addition, our results show that under complex simulation scenarios, gene tree parsimony is also a competitive approach once we consider its speed, in contrast to more sophisticated models.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4851173PMC
http://dx.doi.org/10.1093/sysbio/syu082DOI Listing

Publication Analysis

Top Keywords

species tree
24
data sets
16
tree
10
gene
9
bayesian supertree
8
species
8
tree methods
8
methods deal
8
sources gene
8
gene tree/species
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!