Publications by Yatish Turakhia | LitMetric

Publications by authors named "Yatish Turakhia"

Page 1 of 2

Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES.

Anshu Gupta Siavash Mirarab Yatish Turakhia

bioRxiv

June 2024

Inference of species trees plays a crucial role in advancing our understanding of evolutionary relationships and has immense significance for diverse biological and medical applications. Extensive genome sequencing efforts are currently in progress across a broad spectrum of life forms, holding the potential to unravel the intricate branching patterns within the tree of life. However, estimating species trees starting from raw genome sequences is quite challenging, and the current cutting-edge methodologies require a series of error-prone steps that are neither entirely automated nor standardized.

View Article and Find Full Text PDF

SARS-CoV-2 lineage assignments using phylogenetic placement/UShER are superior to pangoLEARN machine-learning method.

Adriano de Bernardi Schneider Michelle Su Angie S Hinrichs Jade Wang Helly Amin Yatish Turakhia

Virus Evol

January 2024

With the rapid spread and evolution of SARS-CoV-2, the ability to monitor its transmission and distinguish among viral lineages is critical for pandemic response efforts. The most commonly used software for the lineage assignment of newly isolated SARS-CoV-2 genomes is pangolin, which offers two methods of assignment, pangoLEARN and pUShER. PangoLEARN rapidly assigns lineages using a machine-learning algorithm, while pUShER performs a phylogenetic placement to identify the lineage corresponding to a newly sequenced genome.

View Article and Find Full Text PDF

A framework for automated scalable designation of viral pathogen lineages from genomic data.

Jakob McBroome Adriano de Bernardi Schneider Cornelius Roemer Michael T Wolfinger Angie S Hinrichs Yatish Turakhia

Nat Microbiol

February 2024

Pathogen lineage nomenclature systems are a key component of effective communication and collaboration for researchers and public health workers. Since February 2021, the Pango dynamic lineage nomenclature for SARS-CoV-2 has been sustained by crowdsourced lineage proposals as new isolates were sequenced. This approach is vulnerable to time-critical delays as well as regional and personal bias.

View Article and Find Full Text PDF

The ongoing evolution of UShER during the SARS-CoV-2 pandemic.

Angie Hinrichs Cheng Ye Yatish Turakhia Russell Corbett-Detig

Nat Genet

January 2024

View Article and Find Full Text PDF

Whole-genome Comparisons Identify Repeated Regulatory Changes Underlying Convergent Appendage Evolution in Diverse Fish Lineages.

Heidi I Chen Yatish Turakhia Gill Bejerano David M Kingsley

Mol Biol Evol

September 2023

Fins are major functional appendages of fish that have been repeatedly modified in different lineages. To search for genomic changes underlying natural fin diversity, we compared the genomes of 36 percomorph fish species that span over 100 million years of evolution and either have complete or reduced pelvic and caudal fins. We identify 1,614 genomic regions that are well-conserved in fin-complete species but missing from multiple fin-reduced lineages.

View Article and Find Full Text PDF

Tracking and curating putative SARS-CoV-2 recombinants with RIVET.

Kyle Smith Cheng Ye Yatish Turakhia

Bioinformatics

September 2023

Motivation: Identifying and tracking recombinant strains of SARS-CoV-2 is critical to understanding the evolution of the virus and controlling its spread. But confidently identifying SARS-CoV-2 recombinants from thousands of new genome sequences that are being shared online every day is quite challenging, causing many recombinants to be missed or suffer from weeks of delay in being formally identified while undergoing expert curation.

Results: We present RIVET-a software pipeline and visual platform that takes advantage of recent algorithmic advances in recombination inference to comprehensively and sensitively search for potential SARS-CoV-2 recombinants and organize the relevant information in a web interface that would help greatly accelerate the process of identifying and tracking recombinants.

View Article and Find Full Text PDF

DecentTree: scalable Neighbour-Joining for the genomic era.

Weiwen Wang James Barbetti Thomas Wong Bryan Thornlow Russ Corbett-Detig Yatish Turakhia

Bioinformatics

September 2023

Motivation: Neighbour-Joining is one of the most widely used distance-based phylogenetic inference methods. However, current implementations do not scale well for datasets with more than 10 000 sequences. Given the increasing pace of generating new sequence data, particularly in outbreaks of emerging diseases, and the already enormous existing databases of sequence data for which Neighbour-Joining is a useful approach, new implementations of existing methods are warranted.

View Article and Find Full Text PDF

Online Phylogenetics with matOptimize Produces Equivalent Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Implementations.

Alexander M Kramer Bryan Thornlow Cheng Ye Nicola De Maio Jakob McBroome Yatish Turakhia

Syst Biol

November 2023

Phylogenetics has been foundational to SARS-CoV-2 research and public health policy, assisting in genomic surveillance, contact tracing, and assessing emergence and spread of new variants. However, phylogenetic analyses of SARS-CoV-2 have often relied on tools designed for de novo phylogenetic inference, in which all data are collected before any analysis is performed and the phylogeny is inferred once from scratch. SARS-CoV-2 data sets do not fit this mold.

View Article and Find Full Text PDF

A lung-specific mutational signature enables inference of viral and bacterial respiratory niche.

Christopher Ruis Thomas P Peacock Luis M Polo Diego Masone Maria Soledad Alvarez Yatish Turakhia

Microb Genom

May 2023

Exposure to different mutagens leaves distinct mutational patterns that can allow inference of pathogen replication niches. We therefore investigated whether SARS-CoV-2 mutational spectra might show lineage-specific differences, dependent on the dominant site(s) of replication and onwards transmission, and could therefore rapidly infer virulence of emergent variants of concern (VOCs). Through mutational spectrum analysis, we found a significant reduction in G>T mutations in the Omicron variant, which replicates in the upper respiratory tract (URT), compared to other lineages, which replicate in both the URT and lower respiratory tract (LRT).

View Article and Find Full Text PDF

Maximum likelihood pandemic-scale phylogenetics.

Nicola De Maio Prabhav Kalaghatgi Yatish Turakhia Russell Corbett-Detig Bui Quang Minh

Nat Genet

May 2023

Phylogenetics has a crucial role in genomic epidemiology. Enabled by unparalleled volumes of genome sequence data generated to study and help contain the COVID-19 pandemic, phylogenetic analyses of SARS-CoV-2 genomes have shed light on the virus's origins, spread, and the emergence and reproductive success of new variants. However, most phylogenetic approaches, including maximum likelihood and Bayesian methods, cannot scale to the size of the datasets from the current pandemic.

View Article and Find Full Text PDF

Whole-genome comparisons identify repeated regulatory changes underlying convergent appendage evolution in diverse fish lineages.

Heidi I Chen Yatish Turakhia Gill Bejerano David M Kingsley

bioRxiv

January 2023

Fins are major functional appendages of fish that have been repeatedly modified in different lineages. To search for genomic changes underlying natural fin diversity, we compared the genomes of 36 wild fish species that either have complete or reduced pelvic and caudal fins. We identify 1,614 genomic regions that are well-conserved in fin-complete species but missing from multiple fin-reduced lineages.

View Article and Find Full Text PDF

Pandemic-scale phylogenomics reveals the SARS-CoV-2 recombination landscape.

Yatish Turakhia Bryan Thornlow Angie Hinrichs Jakob McBroome Nicolas Ayala

Nature

September 2022

Accurate and timely detection of recombinant lineages is crucial for interpreting genetic variation, reconstructing epidemic spread, identifying selection and variants of interest, and accurately performing phylogenetic analyses. During the SARS-CoV-2 pandemic, genomic data generation has exceeded the capacities of existing analysis platforms, thereby crippling real-time analysis of viral evolution. Here, we use a new phylogenomic method to search a nearly comprehensive SARS-CoV-2 phylogeny for recombinant lineages.

View Article and Find Full Text PDF

Identifying SARS-CoV-2 regional introductions and transmission clusters in real time.

Jakob McBroome Jennifer Martin Adriano de Bernardi Schneider Yatish Turakhia Russell Corbett-Detig

Virus Evol

June 2022

The unprecedented severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) global sequencing effort has suffered from an analytical bottleneck. Many existing methods for phylogenetic analysis are designed for sparse, static datasets and are too computationally expensive to apply to densely sampled, rapidly expanding datasets when results are needed immediately to inform public health action. For example, public health is often concerned with identifying clusters of closely related samples, but the sheer scale of the data prevents manual inspection and the current computational models are often too expensive in time and resources.

View Article and Find Full Text PDF

matOptimize: a parallel tree optimization method enables online phylogenetics for SARS-CoV-2.

Cheng Ye Bryan Thornlow Angie Hinrichs Alexander Kramer Cade Mirchandani Yatish Turakhia

Bioinformatics

August 2022

Motivation: Phylogenetic tree optimization is necessary for precise analysis of evolutionary and transmission dynamics, but existing tools are inadequate for handling the scale and pace of data produced during the coronavirus disease 2019 (COVID-19) pandemic. One transformative approach, online phylogenetics, aims to incrementally add samples to an ever-growing phylogeny, but there are no previously existing approaches that can efficiently optimize this vast phylogeny under the time constraints of the pandemic.

Results: Here, we present matOptimize, a fast and memory-efficient phylogenetic tree optimization tool based on parsimony that can be parallelized across multiple CPU threads and nodes, and provides orders of magnitude improvement in runtime and peak memory usage compared to existing state-of-the-art methods.

View Article and Find Full Text PDF

Online Phylogenetics using Parsimony Produces Slightly Better Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than and Maximum-Likelihood Approaches.

Bryan Thornlow Alexander Kramer Cheng Ye Nicola De Maio Jakob McBroome Yatish Turakhia

bioRxiv

May 2022

Phylogenetics has been foundational to SARS-CoV-2 research and public health policy, assisting in genomic surveillance, contact tracing, and assessing emergence and spread of new variants. However, phylogenetic analyses of SARS-CoV-2 have often relied on tools designed for phylogenetic inference, in which all data are collected before any analysis is performed and the phylogeny is inferred once from scratch. SARS-CoV-2 datasets do not fit this mould.

View Article and Find Full Text PDF

phastSim: Efficient simulation of sequence evolution for pandemic-scale datasets.

Nicola De Maio William Boulton Lukas Weilguny Conor R Walker Yatish Turakhia

PLoS Comput Biol

April 2022

Sequence simulators are fundamental tools in bioinformatics, as they allow us to test data processing and inference tools, and are an essential component of some inference methods. The ongoing surge in available sequence data is however testing the limits of our bioinformatics software. One example is the large number of SARS-CoV-2 genomes available, which are beyond the processing power of many methods, and simulating such large datasets is also proving difficult.

View Article and Find Full Text PDF

Maximum likelihood pandemic-scale phylogenetics.

Nicola De Maio Prabhav Kalaghatgi Yatish Turakhia Russell Corbett-Detig Bui Quang Minh

bioRxiv

July 2022

Phylogenetics plays a crucial role in the interpretation of genomic data. Phylogenetic analyses of SARS-CoV-2 genomes have allowed the detailed study of the virus's origins, of its international and local spread, and of the emergence and reproductive success of new variants, among many applications. These analyses have been enabled by the unparalleled volumes of genome sequence data generated and employed to study and help contain the pandemic.

View Article and Find Full Text PDF

Champagne: Automated Whole-Genome Phylogenomic Character Matrix Method Using Large Genomic Indels for Homoplasy-Free Inference.

James K Schull Yatish Turakhia James A Hemker William J Dally Gill Bejerano

Genome Biol Evol

March 2022

We present Champagne, a whole-genome method for generating character matrices for phylogenomic analysis using large genomic indel events. By rigorously picking orthologous genes and locating large insertion and deletion events, Champagne delivers a character matrix that considerably reduces homoplasy compared with morphological and nucleotide-based matrices, on both established phylogenies and difficult-to-resolve nodes in the mammalian tree. Champagne provides ample evidence in the form of genomic structural variation to support incomplete lineage sorting and possible introgression in Paenungulata and human-chimp-gorilla which were previously inferred primarily through matrices composed of aligned single-nucleotide characters.

View Article and Find Full Text PDF

Pandemic-scale phylogenetics.

Cheng Ye Bryan Thornlow Alexander Kramer Jakob McBroome Angie Hinrichs Yatish Turakhia

bioRxiv

December 2021

Phylogenetics has been central to the genomic surveillance, epidemiology and contact tracing efforts during the COVD-19 pandemic. But the massive scale of genomic sequencing has rendered the pre-pandemic tools inadequate for comprehensive phylogenetic analyses. Here, we discuss the phylogenetic package that we developed to address the needs imposed by this pandemic.

View Article and Find Full Text PDF

A Daily-Updated Database and Tools for Comprehensive SARS-CoV-2 Mutation-Annotated Trees.

Jakob McBroome Bryan Thornlow Angie S Hinrichs Alexander Kramer Nicola De Maio Yatish Turakhia

Mol Biol Evol

December 2021

The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations, as well as Nextstrain clade and Pango lineage labels at clade roots.

View Article and Find Full Text PDF

Ultrafast Sample placement on Existing tRees (UShER) enables real-time phylogenetics for the SARS-CoV-2 pandemic.

Yatish Turakhia Bryan Thornlow Angie S Hinrichs Nicola De Maio Landen Gozashti

Nat Genet

June 2021

As the SARS-CoV-2 virus spreads through human populations, the unprecedented accumulation of viral genome sequences is ushering in a new era of 'genomic contact tracing'-that is, using viral genomes to trace local transmission dynamics. However, because the viral phylogeny is already so large-and will undoubtedly grow many fold-placing new sequences onto the tree has emerged as a barrier to real-time genomic contact tracing. Here, we resolve this challenge by building an efficient tree-based data structure encoding the inferred evolutionary history of the virus.

View Article and Find Full Text PDF

Mutation Rates and Selection on Synonymous Mutations in SARS-CoV-2.

Nicola De Maio Conor R Walker Yatish Turakhia Robert Lanfear Russell Corbett-Detig

Genome Biol Evol

May 2021

The COVID-19 pandemic has seen an unprecedented response from the sequencing community. Leveraging the sequence data from more than 140,000 SARS-CoV-2 genomes, we study mutation rates and selective pressures affecting the virus. Understanding the processes and effects of mutation and selection has profound implications for the study of viral evolution, for vaccine design, and for the tracking of viral spread.

View Article and Find Full Text PDF

A new SARS-CoV-2 lineage that shares mutations with known Variants of Concern is rejected by automated sequence repository quality control.

Bryan Thornlow Angie S Hinrichs Miten Jain Namrita Dhillon Scott La Yatish Turakhia

bioRxiv

April 2021

We report a SARS-CoV-2 lineage that shares N501Y, P681H, and other mutations with known variants of concern, such as B.1.1.

View Article and Find Full Text PDF

A daily-updated database and tools for comprehensive SARS-CoV-2 mutation-annotated trees.

Jakob McBroome Bryan Thornlow Angie S Hinrichs Nicola De Maio Nick Goldman Yatish Turakhia

bioRxiv

July 2021

The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently-proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations as well as Nextstrain clade and Pango lineage labels at clade roots.

View Article and Find Full Text PDF

phastSim: efficient simulation of sequence evolution for pandemic-scale datasets.

Nicola De Maio William Boulton Lukas Weilguny Conor R Walker Yatish Turakhia

bioRxiv

September 2021

Sequence simulators are fundamental tools in bioinformatics, as they allow us to test data processing and inference tools, as well as being part of some inference methods. The ongoing surge in available sequence data is however testing the limits of our bioinformatics software. One example is the large number of SARS-CoV-2 genomes available, which are beyond the processing power of many methods, and simulating such large datasets is also proving difficult.

View Article and Find Full Text PDF