AI Article Synopsis

  • Ancestral sequence reconstruction (ASR) is a key method for examining protein evolution, but its complexity limits widespread use due to the need for various software and expert knowledge.
  • Topiary is a new software pipeline designed to simplify ASR by enabling users to analyze a few sequences via a spreadsheet, streamlining several complex steps like sequence alignment and evolutionary tree generation.
  • The pipeline integrates modern phylogenetic tools with online databases to provide easily interpretable outputs, making it accessible for non-experts and promoting broader understanding and application of ASR methods.

Article Abstract

Ancestral sequence reconstruction (ASR) is a powerful tool to study the evolution of proteins and thus gain deep insight into the relationships among protein sequence, structure, and function. A major barrier to its broad use is the complexity of the task: it requires multiple software packages, complex file manipulations, and expert phylogenetic knowledge. Here we introduce topiary, a software pipeline that aims to overcome this barrier. To use topiary, users prepare a spreadsheet with a handful of sequences. Topiary then: (1) Infers the taxonomic scope for the ASR study and finds relevant sequences by BLAST; (2) Does taxonomically informed sequence quality control and redundancy reduction; (3) Constructs a multiple sequence alignment; (4) Generates a maximum-likelihood gene tree; (5) Reconciles the gene tree to the species tree; (6) Reconstructs ancestral amino acid sequences; and (7) Determines branch supports. The pipeline returns annotated evolutionary trees, spreadsheets with sequences, and graphical summaries of ancestor quality. This is achieved by integrating modern phylogenetics software (Muscle5, RAxML-NG, GeneRax, and PastML) with online databases (NCBI and the Open Tree of Life). In this paper, we introduce non-expert readers to the steps required for ASR, describe the specific design choices made in topiary, provide a detailed protocol for users, and then validate the pipeline using datasets from a broad collection of protein families. Topiary is freely available for download: https://github.com/harmslab/topiary.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9847077PMC
http://dx.doi.org/10.1002/pro.4551DOI Listing

Publication Analysis

Top Keywords

ancestral sequence
8
sequence reconstruction
8
gene tree
8
topiary
6
sequence
5
topiary pruning
4
pruning manual
4
manual labor
4
labor ancestral
4
reconstruction ancestral
4

Similar Publications

Estimating evolutionary and demographic parameters via ARG-derived IBD.

PLoS Genet

January 2025

Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Victoria, Australia.

Inference of evolutionary and demographic parameters from a sample of genome sequences often proceeds by first inferring identical-by-descent (IBD) genome segments. By exploiting efficient data encoding based on the ancestral recombination graph (ARG), we obtain three major advantages over current approaches: (i) no need to impose a length threshold on IBD segments, (ii) IBD can be defined without the hard-to-verify requirement of no recombination, and (iii) computation time can be reduced with little loss of statistical efficiency using only the IBD segments from a set of sequence pairs that scales linearly with sample size. We first demonstrate powerful inferences when true IBD information is available from simulated data.

View Article and Find Full Text PDF

Ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) is an ancient protein critical for CO2-fixation and global biogeochemistry. Form-I RuBisCO complexes uniquely harbor small subunits that form a hexadecameric complex together with their large subunits. The small subunit protein is thought to have significantly contributed to RuBisCO's response to the atmospheric rise of O2 ∼2.

View Article and Find Full Text PDF

Introduction: Southern Africa has been inhabited by hunter-gatherers for at least 20,000 years and has received diverse immigration flows in the last 2000 years. The original inhabitants have interacted with the pastoralist migrants from Eastern Africa (∼2000 ybp), followed by the southern Bantu migration arriving some 1000 ybp, and more recently with the European and Asian settlers after the 17th century. Many of the original Khoekhoe and San inhabitants have either become extinct or have disappeared through admixture in South Africa (SA), in a sex-biased manner involving KhoeSan women.

View Article and Find Full Text PDF

Type II CRISPR endonucleases are widely used programmable genome editing tools. Recently, CRISPR-Cas systems with highly compact nucleases have been discovered, including Cas9d (a type II-D nuclease). Here, we report the cryo-EM structures of a Cas9d nuclease (747 amino acids in length) in multiple functional states, revealing a stepwise process of DNA targeting involving a conformational switch in a REC2 domain insertion.

View Article and Find Full Text PDF

Comparative Evolutionary Epidemiology of SARS-CoV-2 Delta and Omicron Variants in Kuwait.

Viruses

November 2024

Department of Public Health, Ministry of Health, P.O. Box 24923, Kuwait City 13110, Kuwait.

Continuous surveillance is critical for early intervention against emerging novel SARS-CoV-2 variants. Therefore, we investigated and compared the variant-specific evolutionary epidemiology of all the Delta and Omicron sequences collected between 2021 and 2023 in Kuwait. We used Bayesian phylodynamic models to reconstruct, trace, and compare the two variants' demographics, phylogeographic, and host characteristics in shaping their evolutionary epidemiology.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!