Insertions and deletions constitute the second most important source of natural genomic variation. Insertions and deletions make up to 25% of genomic variants in humans and are involved in complex evolutionary processes including genomic rearrangements, adaptation, and speciation. Recent advances in long-read sequencing technologies allow detailed inference of insertions and deletion variation in species and populations.
View Article and Find Full Text PDFBackground: Though Plasmodium vivax is the second most common malaria species to infect humans, it has not traditionally been considered a major human health concern in central Africa given the high prevalence of the human Duffy-negative phenotype that is believed to prevent infection. Increasing reports of asymptomatic and symptomatic infections in Duffy-negative individuals throughout Africa raise the possibility that P. vivax is evolving to evade host resistance, but there are few parasite samples with genomic data available from this part of the world.
View Article and Find Full Text PDFThe Open Tree of Life (OToL) project produces a supertree that summarizes phylogenetic knowledge from tree estimates published in the primary literature. The supertree construction algorithm iteratively calls Aho's Build algorithm thousands of times in order to assess the compatability of different phylogenetic groupings. We describe an incrementalized version of the Build algorithm that is able to share work between successive calls to Build.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
August 2022
To assess the conventional treatment in evolutionary inference of alignment gaps as missing data, we propose a simple nonparametric test of the null hypothesis that the locations of alignment gaps are independent of the nucleotide substitution or amino acid replacement process. When we apply the test to 1,390 protein alignments that are informed by protein tertiary structure and use a 5% significance level, the null hypothesis of independence between amino acid replacement and gap location is rejected for ∼65% of datasets. Via simulations that include substitution and insertion-deletion, we show that the test performs well with true alignments.
View Article and Find Full Text PDFBioinformatics
September 2021
Summary: We describe improvements to BAli-Phy, a Markov chain Monte Carlo (MCMC) program that jointly estimates phylogeny, alignment and other parameters from unaligned sequence data. Version 3 is substantially faster for large trees, and implements covarion models, additional codon models and other new models. It implements ancestral state reconstruction, allows prior selection for all model parameters, and can also analyze multiple genes simultaneously.
View Article and Find Full Text PDFWe present a new supertree method that enables rapid estimation of a summary tree on the scale of millions of leaves. This supertree method summarizes a collection of input phylogenies and an input taxonomy. We introduce formal goals and criteria for such a supertree to satisfy in order to transparently and justifiably represent the input trees.
View Article and Find Full Text PDFWe present a Bayesian method for characterizing the mating system of populations reproducing through a mixture of self-fertilization and random outcrossing. Our method uses patterns of genetic variation across the genome as a basis for inference about reproduction under pure hermaphroditism, gynodioecy, and a model developed to describe the self-fertilizing killifish Kryptolebias marmoratus. We extend the standard coalescence model to accommodate these mating systems, accounting explicitly for multilocus identity disequilibrium, inbreeding depression, and variation in fertility among mating types.
View Article and Find Full Text PDFCurrently available phylogenetic methods for studying the rate of evolution in a continuously valued character assume that the rate is constant throughout the tree or that it changes along specific branches according to an a priori hypothesis of rate variation provided by the user. Herein, we describe a new method for studying evolutionary rate variation in continuously valued characters given an estimate of the phylogenetic history of the species in our study. According to this method, we propose no specific prior hypothesis for how the variation in evolutionary rate is structured throughout the history of the species in our study.
View Article and Find Full Text PDFThe availability of multiple teleost (bony fish) genomes is providing unprecedented opportunities to understand the diversity and function of gene duplication events using comparative genomics. Here we examine multiple paralogous genes of γ-glutamyl transferase (GGT) in several distantly related teleost species including medaka, stickleback, green spotted pufferfish, fugu, and zebrafish. Through mining genome databases, we have identified multiple GGT orthologs.
View Article and Find Full Text PDFThe Caloplaca saxicola group is the main group of saxicolous, lobed-effigurate species within genus Caloplaca (Teloschistaceae, lichen-forming Ascomycota). A recent monographic revision by the first author detected a wide range of morphological variation. To confront the phenotypically based circumscription of these taxa and to resolve their relationships morphological and ITS rDNA data were obtained for 56 individuals representing eight Caloplaca species belonging to the C.
View Article and Find Full Text PDFPhilos Trans R Soc Lond B Biol Sci
December 2008
Models of molecular evolution tend to be overly simplistic caricatures of biology that are prone to assigning high probabilities to biologically implausible DNA or protein sequences. Here, we explore how to construct time-reversible evolutionary models that yield stationary distributions of sequences that match given target distributions. By adopting comparatively realistic target distributions,evolutionary models can be improved.
View Article and Find Full Text PDFBackground: Phylogenies of rapidly evolving pathogens can be difficult to resolve because of the small number of substitutions that accumulate in the short times since divergence. To improve resolution of such phylogenies we propose using insertion and deletion (indel) information in addition to substitution information. We accomplish this through joint estimation of alignment and phylogeny in a Bayesian framework, drawing inference using Markov chain Monte Carlo.
View Article and Find Full Text PDFSummary: BAli-Phy is a Bayesian posterior sampler that employs Markov chain Monte Carlo to explore the joint space of alignment and phylogeny given molecular sequence data. Simultaneous estimation eliminates bias toward inaccurate alignment guide-trees, employs more sophisticated substitution models during alignment and automatically utilizes information in shared insertion/deletions to help infer phylogenies.
Availability: Software is available for download at http://www.
We describe a novel model and algorithm for simultaneously estimating multiple molecular sequence alignments and the phylogenetic trees that relate the sequences. Unlike current techniques that base phylogeny estimates on a single estimate of the alignment, we take alignment uncertainty into account by considering all possible alignments. Furthermore, because the alignment and phylogeny are constructed simultaneously, a guide tree is not needed.
View Article and Find Full Text PDF