J Comput Biol
November 2024
We address the problem of how to estimate a phylogenetic network when given single-nucleotide polymorphisms (i.e., SNPs, or bi-allelic markers that have evolved under the infinite sites assumption).
View Article and Find Full Text PDFMulti-type birth-death processes underlie approaches for inferring evolutionary dynamics from phylogenetic trees across biological scales, ranging from deep-time species macroevolution to rapid viral evolution and somatic cellular proliferation. A limitation of current phylogenetic birth-death models is that they require restrictive linearity assumptions that yield tractable message-passing likelihoods, but that also preclude interactions between individuals. Many fundamental evolutionary processes - such as environmental carrying capacity or frequency-dependent selection - entail interactions, and may strongly influence the dynamics in some systems.
View Article and Find Full Text PDFStoch Process Their Appl
May 2019
Given an edge-weighted tree with leaves, sample the leaves uniformly at random without replacement and let , 2 ≤ ≤ , be the length of the subtree spanned by the first leaves. We consider the question, "Can be identified (up to isomorphism) by the joint probability distribution of the random vector (, …, )?" We show that if is known to belong to one of various families of edge-weighted trees, then the answer is, "Yes." These families include the edge-weighted trees with edge-weights in general position, the ultrametric edge-weighted trees, and certain families with equal weights on all edges such as ( + 1)-valent and rooted -ary trees for ≥ 2 and caterpillars.
View Article and Find Full Text PDFStoch Process Their Appl
July 2017
We consider a Markov chain that iteratively generates a sequence of random finite words in such a way that the word is uniformly distributed over the set of words of length 2 in which letters are and letters are at each step an and a are shuffled in uniformly at random among the letters of the current word. We obtain a concrete characterization of the Doob-Martin boundary of this Markov chain and thereby delineate all the ways in which the Markov chain can be conditioned to behave at large times. Writing for the number of letters (equivalently, ) in the finite word , we show that a sequence ( ) of finite words converges to a point in the boundary if, for an arbitrary word there is convergence as tends to infinity of the probability that the selection of () letters and () letters uniformly at random from and maintaining their relative order results in .
View Article and Find Full Text PDFA metric measure space is a complete, separable metric space equipped with a probability measure that has full support. Two such spaces are equivalent if they are isometric as metric spaces via an isometry that maps the probability measure on the first space to the probability measure on the second. The resulting set of equivalence classes can be metrized with the Gromov-Prohorov metric of Greven, Pfaffelhuber and Winter.
View Article and Find Full Text PDFThe advent of accessible ancient DNA technology now allows the direct ascertainment of allele frequencies in ancestral populations, thereby enabling the use of allele frequency time series to detect and estimate natural selection. Such direct observations of allele frequency dynamics are expected to be more powerful than inferences made using patterns of linked neutral variation obtained from modern individuals. We developed a Bayesian method to make use of allele frequency time series data and infer the parameters of general diploid selection, along with allele age, in nonequilibrium populations.
View Article and Find Full Text PDFWe consider a population living in a patchy environment that varies stochastically in space and time. The population is composed of two morphs (that is, individuals of the same species with different genotypes). In terms of survival and reproductive success, the associated phenotypes differ only in their habitat selection strategies.
View Article and Find Full Text PDFEvolutionary processes of natural selection may be expected to leave their mark on age patterns of survival and reproduction. Demographic theory includes three main strands--mutation accumulation, stochastic vitality, and optimal life histories. This paper reviews the three strands and, concentrating on mutation accumulation, extends a mathematical result with broad implications concerning the effect of interactions between small age-specific effects of deleterious mutant alleles.
View Article and Find Full Text PDFWe investigate the properties of a Wright-Fisher diffusion process starting at frequency x at time 0 and conditioned to be at frequency y at time T. Such a process is called a bridge. Bridges arise naturally in the analysis of selection acting on standing variation and in the inference of selection from allele frequency time series.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
June 2013
W. D. Hamilton's celebrated formula for the age-specific force of natural selection furnishes predictions for senescent mortality due to mutation accumulation, at the price of reliance on a linear approximation.
View Article and Find Full Text PDFPrincipal components analysis (PCA) and hierarchical clustering are two of the most heavily used techniques for analyzing the differences between nucleic acid sequence samples taken from a given environment. They have led to many insights regarding the structure of microbial communities. We have developed two new complementary methods that leverage how this microbial community data sits on a phylogenetic tree.
View Article and Find Full Text PDFRecent advances in sequencing technologies have made available an ever-increasing amount of ancient genomic data. In particular, it is now possible to target specific single nucleotide polymorphisms in several samples at different time points. Such time-series data are also available in the context of experimental or viral evolution.
View Article and Find Full Text PDFJ R Stat Soc Series B Stat Methodol
June 2012
It is now common to survey microbial communities by sequencing nucleic acid material extracted in bulk from a given environment. Comparative methods are needed that indicate the extent to which two communities differ given data sets of this type. UniFrac, which gives a somewhat ad hoc phylogenetics-based distance between two communities, is one of the most commonly used tools for these analyses.
View Article and Find Full Text PDFBackground: There are several common ways to encode a tree as a matrix, such as the adjacency matrix, the Laplacian matrix (that is, the infinitesimal generator of the natural random walk), and the matrix of pairwise distances between leaves. Such representations involve a specific labeling of the vertices or at least the leaves, and so it is natural to attempt to identify trees by some feature of the associated matrices that is invariant under relabeling. An obvious candidate is the spectrum of eigenvalues (or, equivalently, the characteristic polynomial).
View Article and Find Full Text PDFClassical ecological theory predicts that environmental stochasticity increases extinction risk by reducing the average per-capita growth rate of populations. For sedentary populations in a spatially homogeneous yet temporally variable environment, a simple model of population growth is a stochastic differential equation dZ(t) = μZ(t)dt + σZ(t)dW(t), t ≥ 0, where the conditional law of Z(t+Δt)-Z(t) given Z(t) = z has mean and variance approximately z μΔt and z²σ²Δt when the time increment Δt is small. The long-term stochastic growth rate lim(t→∞) t⁻¹ log Z(t) for such a population equals μ − σ²/2 .
View Article and Find Full Text PDFRecent whole genome polymerase binding assays in the Drosophila embryo have shown that a substantial proportion of uninduced genes have pre-assembled RNA polymerase-II transcription initiation complex (PIC) bound to their promoters. These constitute a subset of promoter proximally paused genes for which mRNA elongation instead of promoter access is regulated. This difference can be described as a rearrangement of the regulatory topology to control the downstream transcriptional process of elongation rather than the upstream transcriptional initiation event.
View Article and Find Full Text PDFBackground: The identification of binding targets for proteins using ChIP-Seq has gained popularity as an alternative to ChIP-chip. Sequencing can, in principle, eliminate artifacts associated with microarrays, and cheap sequencing offers the ability to sequence deeply and obtain a comprehensive survey of binding. A number of algorithms have been developed to call "peaks" representing bound regions from mapped reads.
View Article and Find Full Text PDFBackground: We study the statistical properties of fragment coverage in genome sequencing experiments. In an extension of the classic Lander-Waterman model, we consider the effect of the length distribution of fragments. We also introduce a coding of the shape of the coverage depth function as a tree and explain how this can be used to detect regions with anomalous coverage.
View Article and Find Full Text PDFWe consider the problem of constructing confidence intervals for the mean of a Negative Binomial random variable based upon sampled data. When the sample size is large, it is a common practice to rely upon a Normal distribution approximation to construct these intervals. However, we demonstrate that the sample mean of highly dispersed Negative Binomials exhibits a slow convergence in distribution to the Normal as a function of the sample size.
View Article and Find Full Text PDFNew models for evolutionary processes of mutation accumulation allow hypotheses about the age-specificity of mutational effects to be translated into predictions of heterogeneous population hazard functions. We apply these models to questions in the biodemography of longevity, including proposed explanations of Gompertz hazards and mortality plateaus.
View Article and Find Full Text PDFRecent statistical and computational analyses have shown that a genealogical most recent common ancestor (MRCA) may have lived in the recent past [Chang, J.T., 1999.
View Article and Find Full Text PDFTheor Popul Biol
June 2007
A fissioning organism may purge unrepairable damage by bequeathing it preferentially to one of its daughters. Using the mathematical formalism of superprocesses, we propose a flexible class of analytically tractable models that allow quite general effects of damage on death rates and splitting rates and similarly general damage segregation mechanisms. We show that, in a suitable regime, the effects of randomness in damage segregation at fissioning are indistinguishable from those of randomness in the mechanism of damage accumulation during the organism's lifetime.
View Article and Find Full Text PDFIEEE/ACM Trans Comput Biol Bioinform
November 2006
The rates-across-sites assumption in phylogenetic inference posits that the rate matrix governing the Markovian evolution of a character on an edge of the putative phylogenetic tree is the product of a character-specific scale factor and a rate matrix that is particular to that edge. Thus, evolution follows basically the same process for all characters, except that it occurs faster for some characters than others. To allow estimation of tree topologies and edge lengths for such models, it is commonly assumed that the scale factors are not arbitrary unknown constants, but rather unobserved, independent, identically distributed draws from a member of some parametric family of distributions.
View Article and Find Full Text PDFA forward diffusion equation describing the evolution of the allele frequency spectrum is presented. The influx of mutations is accounted for by imposing a suitable boundary condition. For a Wright-Fisher diffusion with or without selection and varying population size, the boundary condition is lim(x downward arrow0)xf(x,t)=thetarho(t), where f(.
View Article and Find Full Text PDF