Evolutionary relationships are represented by phylogenetic trees, and a phylogenetic analysis of gene sequences typically produces a collection of these trees, one for each gene in the analysis. Analysis of samples of trees is difficult due to the multi-dimensionality of the space of possible trees. In Euclidean spaces, principal component analysis is a popular method of reducing high-dimensional data to a low-dimensional representation that preserves much of the sample's structure.
View Article and Find Full Text PDFMost phylogenetic models assume that the evolutionary process is stationary and reversible. In addition to being biologically improbable, these assumptions also impair inference by generating models under which the likelihood does not depend on the position of the root. Consequently, the root of the tree cannot be inferred as part of the analysis.
View Article and Find Full Text PDFMost existing measures of distance between phylogenetic trees are based on the geometry or topology of the trees. Instead, we consider distance measures which are based on the underlying probability distributions on genetic sequence data induced by trees. Monte Carlo schemes are necessary to calculate these distances approximately, and we describe efficient sampling procedures.
View Article and Find Full Text PDFPhilos Trans R Soc Lond B Biol Sci
September 2015
The root of a phylogenetic tree is fundamental to its biological interpretation, but standard substitution models do not provide any information on its position. Here, we describe two recently developed models that relax the usual assumptions of stationarity and reversibility, thereby facilitating root inference without the need for an outgroup. We compare the performance of these models on a classic test case for phylogenetic methods, before considering two highly topical questions in evolutionary biology: the deep structure of the tree of life and the root of the archaeal radiation.
View Article and Find Full Text PDFStat Appl Genet Mol Biol
October 2014
In molecular phylogenetics, standard models of sequence evolution generally assume that sequence composition remains constant over evolutionary time. However, this assumption is violated in many datasets which show substantial heterogeneity in sequence composition across taxa. We propose a model which allows compositional heterogeneity across branches, and formulate the model in a Bayesian framework.
View Article and Find Full Text PDFIEEE/ACM Trans Comput Biol Bioinform
March 2016
Most phylogenetic analyses result in a sample of trees, but summarizing and visualizing these samples can be challenging. Consensus trees often provide limited information about a sample, and so methods such as consensus networks, clustering and multidimensional scaling have been developed and applied to tree samples. This paper describes a stochastic algorithm for constructing a principal geodesic or line through treespace which is analogous to the first principal component in standard principal components analysis.
View Article and Find Full Text PDFDetermining the relationships among the major groups of cellular life is important for understanding the evolution of biological diversity, but is difficult given the enormous time spans involved. In the textbook 'three domains' tree based on informational genes, eukaryotes and Archaea share a common ancestor to the exclusion of Bacteria. However, some phylogenetic analyses of the same data have placed eukaryotes within the Archaea, as the nearest relatives of different archaeal lineages.
View Article and Find Full Text PDFA number of biological processes can lead to genes being copied within the genome of some given species. Duplicate genes of this form are called paralogs and such genes share a high degree sequence similarity as well as often having closely related functions. Some genes have become widely duplicated to form multigene families in which the copies are distributed both within the genomes of individual species and across different species.
View Article and Find Full Text PDFPhylogenetic analysis very commonly produces several alternative trees for a given fixed set of taxa. For example, different sets of orthologous genes may be analyzed, or the analysis may sample from a distribution of probable trees. This article describes an approach to comparing and visualizing multiple alternative phylogenies via the idea of a "tree of trees" or "meta-tree.
View Article and Find Full Text PDFStat Appl Genet Mol Biol
June 2006
Experiments to determine the complete 3-dimensional structures of protein complexes are difficult to perform and only a limited range of such structures are available. In contrast, large-scale screening experiments have identified thousands of pairwise interactions between proteins, but such experiments do not produce explicit structural information. In addition, the data produced by these high through-put experiments contain large numbers of false positive results, and can be biased against detection of certain types of interaction.
View Article and Find Full Text PDFSummary: We describe an algorithm and software tool for comparing alternative phylogenetic trees. The main application of the software is to compare phylogenies obtained using different phylogenetic methods for some fixed set of species or obtained using different gene sequences from those species. The algorithm pairs up each branch in one phylogeny with a matching branch in the second phylogeny and finds the optimum 1-to-1 map between branches in the two trees in terms of a topological score.
View Article and Find Full Text PDFMotivation: Several methods have recently been developed to analyse large-scale sets of physical interactions between proteins in terms of physical contacts between the constituent domains, often with a view to predicting new pairwise interactions. Our aim is to combine genomic interaction data, in which domain-domain contacts are not explicitly reported, with the domain-level structure of individual proteins, in order to learn about the structure of interacting protein pairs. Our approach is driven by the need to assess the evidence for physical contacts between domains in a statistically rigorous way.
View Article and Find Full Text PDF