Ancestral sequence reconstruction (ASR) is a powerful tool to study the evolution of proteins and thus gain deep insight into the relationships among protein sequence, structure, and function. A major barrier to its broad use is the complexity of the task: it requires multiple software packages, complex file manipulations, and expert phylogenetic knowledge. Here we introduce topiary, a software pipeline that aims to overcome this barrier. To use topiary, users prepare a spreadsheet with a handful of sequences. Topiary then: (1) Infers the taxonomic scope for the ASR study and finds relevant sequences by BLAST; (2) Does taxonomically informed sequence quality control and redundancy reduction; (3) Constructs a multiple sequence alignment; (4) Generates a maximum-likelihood gene tree; (5) Reconciles the gene tree to the species tree; (6) Reconstructs ancestral amino acid sequences; and (7) Determines branch supports. The pipeline returns annotated evolutionary trees, spreadsheets with sequences, and graphical summaries of ancestor quality. This is achieved by integrating modern phylogenetics software (Muscle5, RAxML-NG, GeneRax, and PastML) with online databases (NCBI and the Open Tree of Life). In this paper, we introduce non-expert readers to the steps required for ASR, describe the specific design choices made in topiary, provide a detailed protocol for users, and then validate the pipeline using datasets from a broad collection of protein families. Topiary is freely available for download: https://github.com/harmslab/topiary.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9847077 | PMC |
http://dx.doi.org/10.1002/pro.4551 | DOI Listing |
PLoS Genet
January 2025
Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Victoria, Australia.
Inference of evolutionary and demographic parameters from a sample of genome sequences often proceeds by first inferring identical-by-descent (IBD) genome segments. By exploiting efficient data encoding based on the ancestral recombination graph (ARG), we obtain three major advantages over current approaches: (i) no need to impose a length threshold on IBD segments, (ii) IBD can be defined without the hard-to-verify requirement of no recombination, and (iii) computation time can be reduced with little loss of statistical efficiency using only the IBD segments from a set of sequence pairs that scales linearly with sample size. We first demonstrate powerful inferences when true IBD information is available from simulated data.
View Article and Find Full Text PDFMol Biol Evol
January 2025
Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA.
Ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) is an ancient protein critical for CO2-fixation and global biogeochemistry. Form-I RuBisCO complexes uniquely harbor small subunits that form a hexadecameric complex together with their large subunits. The small subunit protein is thought to have significantly contributed to RuBisCO's response to the atmospheric rise of O2 ∼2.
View Article and Find Full Text PDFAnn Hum Genet
January 2025
Institute of Legal Medicine, Medical University of Innsbruck, Innsbruck, Austria.
Introduction: Southern Africa has been inhabited by hunter-gatherers for at least 20,000 years and has received diverse immigration flows in the last 2000 years. The original inhabitants have interacted with the pastoralist migrants from Eastern Africa (∼2000 ybp), followed by the southern Bantu migration arriving some 1000 ybp, and more recently with the European and Asian settlers after the 17th century. Many of the original Khoekhoe and San inhabitants have either become extinct or have disappeared through admixture in South Africa (SA), in a sex-biased manner involving KhoeSan women.
View Article and Find Full Text PDFNat Commun
January 2025
Interdisciplinary Life Sciences Graduate Programs, University of Texas at Austin, Austin, TX, 78712, USA.
Type II CRISPR endonucleases are widely used programmable genome editing tools. Recently, CRISPR-Cas systems with highly compact nucleases have been discovered, including Cas9d (a type II-D nuclease). Here, we report the cryo-EM structures of a Cas9d nuclease (747 amino acids in length) in multiple functional states, revealing a stepwise process of DNA targeting involving a conformational switch in a REC2 domain insertion.
View Article and Find Full Text PDFViruses
November 2024
Department of Public Health, Ministry of Health, P.O. Box 24923, Kuwait City 13110, Kuwait.
Continuous surveillance is critical for early intervention against emerging novel SARS-CoV-2 variants. Therefore, we investigated and compared the variant-specific evolutionary epidemiology of all the Delta and Omicron sequences collected between 2021 and 2023 in Kuwait. We used Bayesian phylodynamic models to reconstruct, trace, and compare the two variants' demographics, phylogeographic, and host characteristics in shaping their evolutionary epidemiology.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!