Publications by authors named "Sebastien Roch"

Maximum likelihood estimation is among the most widely-used methods for inferring phylogenetic trees from sequence data. This paper solves the problem of computing solutions to the maximum likelihood problem for 3-leaf trees under the 2-state symmetric mutation model (CFN model). Our main result is a closed-form solution to the maximum likelihood problem for unrooted 3-leaf trees, given generic data; this result characterizes all of the ways that a maximum likelihood estimate can fail to exist for generic data and provides theoretical validation for predictions made in Parks and Goldman (Syst Biol 63(5):798-811, 2014).

View Article and Find Full Text PDF

We consider species tree estimation from multiple loci subject to intralocus recombination. We focus on , a summary coalescent-based method using rooted triplets, as well as a related quartet-based inference method. We demonstrate analytically that in both cases, intralocus recombination gives rise to an inconsistency zone, in which correct inference is not assured even in the limit of infinite amount of data.

View Article and Find Full Text PDF

Species tree estimation faces many significant hurdles. Chief among them is that the trees describing the ancestral lineages of each individual gene-the gene trees-often differ from the species tree. The multispecies coalescent is commonly used to model this gene tree discordance, at least when it is believed to arise from incomplete lineage sorting, a population-genetic effect.

View Article and Find Full Text PDF

Phylogenomics-the estimation of species trees from multilocus data sets-is a common step in many biological studies. However, this estimation is challenged by the fact that genes can evolve under processes, including incomplete lineage sorting (ILS) and gene duplication and loss (GDL), that make their trees different from the species tree. In this article, we address the challenge of estimating the species tree under GDL.

View Article and Find Full Text PDF

We consider the problem of distance estimation under the TKF91 model of sequence evolution by insertions, deletions and substitutions on a phylogeny. In an asymptotic regime where the expected sequence lengths tend to infinity, we show that no consistent distance estimation is possible from sequence lengths alone. More formally, we establish that the distributions of pairs of sequence lengths at different distances cannot be distinguished with probability going to one.

View Article and Find Full Text PDF

In evolutionary biology, the speciation history of living organisms is represented graphically by a phylogeny, that is, a rooted tree whose leaves correspond to current species and whose branchings indicate past speciation events. Phylogenetic analyses often rely on molecular sequences, such as DNA sequences, collected from the species of interest, and it is common in this context to employ statistical approaches based on stochastic models of sequence evolution on a tree. For tractability, such models necessarily make simplifying assumptions about the evolutionary mechanisms involved.

View Article and Find Full Text PDF

To sample marginalized and/or hard-to-reach populations, respondent-driven sampling (RDS) and similar techniques reach their participants via peer referral. Under a Markov model for RDS, previous research has shown that if the typical participant refers too many contacts, then the variance of common estimators does not decay like [Formula: see text], where n is the sample size. This implies that confidence intervals will be far wider than under a typical sampling design.

View Article and Find Full Text PDF

With advances in sequencing technologies, there are now massive amounts of genomic data from across all life, leading to the possibility that a robust Tree of Life can be constructed. However, "gene tree heterogeneity", which is when different genomic regions can evolve differently, is a common phenomenon in multi-locus data sets, and reduces the accuracy of standard methods for species tree estimation that do not take this heterogeneity into account. New methods have been developed for species tree estimation that specifically address gene tree heterogeneity, and that have been proven to converge to the true species tree when the number of loci and number of sites per locus both increase (i.

View Article and Find Full Text PDF

The sample frequency spectrum (SFS), which describes the distribution of mutant alleles in a sample of DNA sequences, is a widely used summary statistic in population genetics. The expected SFS has a strong dependence on the historical population demography and this property is exploited by popular statistical methods to infer complex demographic histories from DNA sequence data. Most, if not all, of these inference methods exhibit pathological behavior, however.

View Article and Find Full Text PDF

Species tree reconstruction from genomic data is increasingly performed using methods that account for sources of gene tree discordance such as incomplete lineage sorting. One popular method for reconstructing species trees from unrooted gene tree topologies is ASTRAL. In this paper, we derive theoretical sample complexity results for the number of genes required by ASTRAL to guarantee reconstruction of the correct species tree with high probability.

View Article and Find Full Text PDF

Diffusion processes on trees are commonly used in evolutionary biology to model the joint distribution of continuous traits, such as body mass, across species. Estimating the parameters of such processes from tip values presents challenges because of the intrinsic correlation between the observations produced by the shared evolutionary history, thus violating the standard independence assumption of large-sample theory. For instance (Ho and Ané, Ann Stat 41:957-981, 2013) recently proved that the mean (also known in this context as selection optimum) of an Ornstein-Uhlenbeck process on a tree cannot be estimated consistently from an increasing number of tip observations if the tree height is bounded.

View Article and Find Full Text PDF

We consider the problem of estimating the evolutionary history of a set of species (phylogeny or species tree) from several genes. It is known that the evolutionary history of individual genes (gene trees) might be topologically distinct from each other and from the underlying species tree, possibly confounding phylogenetic analysis. A further complication in practice is that one has to estimate gene trees from molecular sequences of finite length.

View Article and Find Full Text PDF

The estimation of species trees using multiple loci has become increasingly common. Because different loci can have different phylogenetic histories (reflected in different gene tree topologies) for multiple biological causes, new approaches to species tree estimation have been developed that take gene tree heterogeneity into account. Among these multiple causes, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is potentially the most common cause of gene tree heterogeneity, and much of the focus of the recent literature has been on how to estimate species trees in the presence of ILS.

View Article and Find Full Text PDF

The reconstruction of a species tree from genomic data faces a double hurdle. First, the (gene) tree describing the evolution of each gene may differ from the species tree, for instance, due to incomplete lineage sorting. Second, the aligned genetic sequences at the leaves of each gene tree provide merely an imperfect estimate of the topology of the gene tree.

View Article and Find Full Text PDF

Incomplete lineage sorting (ILS) is a common source of gene tree incongruence in multilocus analyses. Numerous approaches have been developed to infer species trees in the presence of ILS. Here we provide a mathematical analysis of several coalescent-based methods.

View Article and Find Full Text PDF

Lateral gene transfer (LGT) is a common mechanism of nonvertical evolution, during which genetic material is transferred between two more or less distantly related organisms. It is particularly common in bacteria where it contributes to adaptive evolution with important medical implications. In evolutionary studies, LGT has been shown to create widespread discordance between gene trees as genomes become mosaics of gene histories.

View Article and Find Full Text PDF

Mutation rate variation across loci is well known to cause difficulties, notably identifiability issues, in the reconstruction of evolutionary trees from molecular sequences. Here we introduce a new approach for estimating general rates-across-sites models. Our results imply, in particular, that large phylogenies are typically identifiable under rate variation.

View Article and Find Full Text PDF

The accurate reconstruction of phylogenies from short molecular sequences is an important problem in computational biology. Recent work has highlighted deep connections between sequence-length requirements for high-probability phylogeny reconstruction and the related problem of the estimation of ancestral sequences. In Daskalakis et al.

View Article and Find Full Text PDF

The matrix of evolutionary distances is a model-based statistic, derived from molecular sequences, summarizing the pairwise phylogenetic relations between a collection of species. Phylogenetic tree reconstruction methods relying on this matrix are relatively fast and thus widely used in molecular systematics. However, because of their intrinsic reliance on summary statistics, distance-matrix methods are assumed to be less accurate than likelihood-based approaches.

View Article and Find Full Text PDF

We introduce a simple computationally efficient algorithm for reconstructing phylogenies from multiple gene trees in the presence of incomplete lineage sorting, that is, when the topology of the gene trees may differ from that of the species tree. We show that our technique is statistically consistent under standard stochastic assumptions, that is, it returns the correct tree given sufficiently many unlinked loci. We also show that it can tolerate moderate estimation errors.

View Article and Find Full Text PDF

Ancestral maximum likelihood (AML) is a method that simultaneously reconstructs a phylogenetic tree and ancestral sequences from extant data (sequences at the leaves). The tree and ancestral sequences maximize the probability of observing the given data under a Markov model of sequence evolution, in which branch lengths are also optimized but constrained to take the same value on any edge across all sequence sites. AML differs from the more usual form of maximum likelihood (ML) in phylogenetics because ML averages over all possible ancestral sequences.

View Article and Find Full Text PDF

If someone is nice to you, you feel good and may be inclined to be nice to somebody else. This every day experience is borne out by experimental games: the recipients of an act of kindness are more likely to help in turn, even if the person who benefits from their generosity is somebody else. This behaviour, which has been called 'upstream reciprocity', appears to be a misdirected act of gratitude: you help somebody because somebody else has helped you.

View Article and Find Full Text PDF

Maximum likelihood is one of the most widely used techniques to infer evolutionary histories. Although it is thought to be intractable, a proof of its hardness has been lacking. Here, we give a short proof that computing the maximum likelihood tree is NP-hard by exploiting a connection between likelihood and parsimony observed by Tuffley and Steel.

View Article and Find Full Text PDF

A PHP Error was encountered

Severity: Warning

Message: fopen(/var/lib/php/sessions/ci_session15phq66kq517sf8q642dep6b45nbarb5): Failed to open stream: No space left on device

Filename: drivers/Session_files_driver.php

Line Number: 177

Backtrace:

File: /var/www/html/index.php
Line: 316
Function: require_once

A PHP Error was encountered

Severity: Warning

Message: session_start(): Failed to read session data: user (path: /var/lib/php/sessions)

Filename: Session/Session.php

Line Number: 137

Backtrace:

File: /var/www/html/index.php
Line: 316
Function: require_once