Publications by Anton Nekrutenko | LitMetric

Publications by authors named "Anton Nekrutenko"

Page 1 of 4

Minus the Error: Estimating / and Testing for Natural Selection in the Presence of Residual Alignment Errors.

Avery G Selberg Maria Chikina Tim Sackton Spencer V Muse Alexander G Lucaci Anton Nekrutenko

bioRxiv

November 2024

Errors in multiple sequence alignments (MSAs) are known to bias many comparative evolutionary methods. In the context of natural selection analyses, specifically codon evolutionary models, excessive rates of false positives result. A characteristic signature of error-driven findings is unrealistically high estimates of dN/dS (e.

View Article and Find Full Text PDF

KegAlign: Optimizing pairwise alignments with diagonal partitioning.

A Burak Gulhan Richard Burhans Robert Harris Mahmut Kandemir Maximilian Haeussler Anton Nekrutenko

bioRxiv

September 2024

Our ability to generate sequencing data and assemble it into high quality complete genomes has rapidly advanced in recent years. These data promise to advance our understanding of organismal biology and answer longstanding evolutionary questions. Multiple genome alignment is a key tool in this quest.

View Article and Find Full Text PDF

Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy.

Delphine Larivière Linelle Abueg Nadolina Brajuka Cristóbal Gallardo-Alba Bjorn Grüning Anton Nekrutenko

Nat Biotechnol

March 2024

View Article and Find Full Text PDF

Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy.

Delphine Larivière Linelle Abueg Nadolina Brajuka Cristóbal Gallardo-Alba Bjorn Grüning Anton Nekrutenko

bioRxiv

June 2023

Improvements in genome sequencing and assembly are enabling high-quality reference genomes for all species. However, the assembly process is still laborious, computationally and technically demanding, lacks standards for reproducibility, and is not readily scalable. Here we present the latest Vertebrate Genomes Project assembly pipeline and demonstrate that it delivers high-quality reference genomes at scale across a set of vertebrate species arising over the last ~500 million years.

View Article and Find Full Text PDF

Fast and accurate genome-wide predictions and structural modeling of protein-protein interactions using Galaxy.

Aysam Guerler Dannon Baker Marius van den Beek Bjoern Gruening Dave Bouvier Anton Nekrutenko

BMC Bioinformatics

June 2023

Article Synopsis

- The text discusses the importance of protein-protein interactions in cellular processes and how identifying these interactions can lead to new drug targets for diseases.
- An automated pipeline was developed to predict protein-protein interactions across genomes, demonstrating success in modeling interactions in both human and yeast proteins, particularly in relation to SARS-CoV2.
- The method produces reliable interaction models that can be experimentally validated, and the pipeline is publicly accessible at specific Galaxy platforms.

View Article and Find Full Text PDF

The Planemo toolkit for developing, deploying, and executing scientific data analyses in Galaxy and beyond.

Simon Bray John Chilton Matthias Bernt Nicola Soranzo Marius van den Beek Anton Nekrutenko

Genome Res

February 2023

There are thousands of well-maintained high-quality open-source software utilities for all aspects of scientific data analysis. For more than a decade, the Galaxy Project has been providing computational infrastructure and a unified user interface for these tools to make them accessible to a wide range of researchers. To streamline the process of integrating tools and constructing workflows as much as possible, we have developed Planemo, a software development kit for tool and workflow developers and Galaxy power users.

View Article and Find Full Text PDF

RASCL: Rapid Assessment of Selection in CLades through molecular sequence analysis.

Alexander G Lucaci Jordan D Zehr Stephen D Shank Dave Bouvier Alexander Ostrovsky Anton Nekrutenko

PLoS One

November 2022

An important unmet need revealed by the COVID-19 pandemic is the near-real-time identification of potentially fitness-altering mutations within rapidly growing SARS-CoV-2 lineages. Although powerful molecular sequence analysis methods are available to detect and characterize patterns of natural selection within modestly sized gene-sequence datasets, the computational complexity of these methods and their sensitivity to sequencing errors render them effectively inapplicable in large-scale genomic surveillance contexts. Motivated by the need to analyze new lineage evolution in near-real time using large numbers of genomes, we developed the Rapid Assessment of Selection within CLades (RASCL) pipeline.

View Article and Find Full Text PDF

Detection of SARS-CoV-2 intra-host recombination during superinfection with Alpha and Epsilon variants in New York City.

Joel O Wertheim Jade C Wang Mindy Leelawong Darren P Martin Jennifer L Havens Anton Nekrutenko

Nat Commun

June 2022

Article Synopsis

* In this case, an individual was superinfected with two SARS-CoV-2 variants, Alpha (B.1.1.7) and Epsilon (B.1.429), which led to unexpected genomic characteristics in the Alpha variant.
* Full genome sequencing indicated that the Alpha variant made up about 75% of the viral presence, with the Epsilon variant at around 20%, and revealed multiple recombinant forms that could influence the virus's evolution.

View Article and Find Full Text PDF

Selection Analysis Identifies Clusters of Unusual Mutational Changes in Omicron Lineage BA.1 That Likely Impact Spike Function.

Darren P Martin Spyros Lytras Alexander G Lucaci Wolfgang Maier Björn Grüning Anton Nekrutenko

Mol Biol Evol

April 2022

Among the 30 nonsynonymous nucleotide substitutions in the Omicron S-gene are 13 that have only rarely been seen in other SARS-CoV-2 sequences. These mutations cluster within three functionally important regions of the S-gene at sites that will likely impact (1) interactions between subunits of the Spike trimer and the predisposition of subunits to shift from down to up configurations, (2) interactions of Spike with ACE2 receptors, and (3) the priming of Spike for membrane fusion. We show here that, based on both the rarity of these 13 mutations in intrapatient sequencing reads and patterns of selection at the codon sites where the mutations occur in SARS-CoV-2 and related sarbecoviruses, prior to the emergence of Omicron the mutations would have been predicted to decrease the fitness of any virus within which they occurred.

View Article and Find Full Text PDF

Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space.

Michael C Schatz Anthony A Philippakis Enis Afgan Eric Banks Vincent J Carey Anton Nekrutenko

Cell Genom

January 2022

The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL; https://anvilproject.org) was developed to address a widespread community need for a unified computing environment for genomics data storage, management, and analysis. In this perspective, we present AnVIL, describe its ecosystem and interoperability with other platforms, and highlight how this platform and associated initiatives contribute to improved genomic data sharing efforts.

View Article and Find Full Text PDF

RASCL: Rapid Assessment Of SARS-CoV-2 Clades Through Molecular Sequence Analysis.

Alexander G Lucaci Jordan D Zehr Stephen D Shank Dave Bouvier Han Mei Anton Nekrutenko

bioRxiv

January 2022

Unlabelled: An important component of efforts to manage the ongoing COVID19 pandemic is the R apid A ssessment of how natural selection contributes to the emergence and proliferation of potentially dangerous S ARS-CoV-2 lineages and CL ades (RASCL). The RASCL pipeline enables continuous comparative phylogenetics-based selection analyses of rapidly growing clade-focused genome surveillance datasets, such as those produced following the initial detection of potentially dangerous variants. From such datasets RASCL automatically generates down-sampled codon alignments of individual genes/ORFs containing contextualizing background reference sequences, analyzes these with a battery of selection tests, and outputs results as both machine readable JSON files, and interactive notebook-based visualizations.

View Article and Find Full Text PDF

Selection analysis identifies unusual clustered mutational changes in Omicron lineage BA.1 that likely impact Spike function.

Darren P Martin Spyros Lytras Alexander G Lucaci Wolfgang Maier Björn Grüning Anton Nekrutenko

bioRxiv

January 2022

Among the 30 non-synonymous nucleotide substitutions in the Omicron S-gene are 13 that have only rarely been seen in other SARS-CoV-2 sequences. These mutations cluster within three functionally important regions of the S-gene at sites that will likely impact (i) interactions between subunits of the Spike trimer and the predisposition of subunits to shift from down to up configurations, (ii) interactions of Spike with ACE2 receptors, and (iii) the priming of Spike for membrane fusion. We show here that, based on both the rarity of these 13 mutations in intrapatient sequencing reads and patterns of selection at the codon sites where the mutations occur in SARS-CoV-2 and related sarbecoviruses, prior to the emergence of Omicron the mutations would have been predicted to decrease the fitness of any genomes within which they occurred.

View Article and Find Full Text PDF

Ready-to-use public infrastructure for global SARS-CoV-2 monitoring.

Wolfgang Maier Simon Bray Marius van den Beek Dave Bouvier Nathan Coraor Anton Nekrutenko

Nat Biotechnol

October 2021

View Article and Find Full Text PDF

Stepwise Evolution and Exceptional Conservation of ORF1a/b Overlap in Coronaviruses.

Han Mei Sergei Kosakovsky Pond Anton Nekrutenko

Mol Biol Evol

December 2021

The programmed frameshift element (PFE) rerouting translation from ORF1a to ORF1b is essential for the propagation of coronaviruses. The combination of genomic features that make up PFE-the overlap between the two reading frames, a slippery sequence, as well as an ensemble of complex secondary structure elements-places severe constraints on this region as most possible nucleotide substitution may disrupt one or more of these elements. The vast amount of SARS-CoV-2 sequencing data generated within the past year provides an opportunity to assess the evolutionary dynamics of PFE in great detail.

View Article and Find Full Text PDF

Stepwise evolution and exceptional conservation of ORF1a/b overlap in coronaviruses.

Han Mei Anton Nekrutenko

bioRxiv

June 2021

The programmed frameshift element (PFE) rerouting translation from to is essential for propagation of coronaviruses. A combination of genomic features that make up PFE-the overlap between the two reading frames, a slippery sequence, as well as an ensemble of complex secondary structure elements-puts severe constraints on this region as most possible nucleotide substitution may disrupt one or more of these elements. The vast amount of SARS-CoV-2 sequencing data generated within the past year provides an opportunity to assess evolutionary dynamics of PFE in great detail.

View Article and Find Full Text PDF

Reproducible and accessible analysis of transposon insertion sequencing in Galaxy for qualitative essentiality analyses.

Delphine Larivière Laura Wickham Kenneth Keiler Anton Nekrutenko

BMC Microbiol

June 2021

Background: Significant progress has been made in advancing and standardizing tools for human genomic and biomedical research. Yet, the field of next-generation sequencing (NGS) analysis for microorganisms (including multiple pathogens) remains fragmented, lacks accessible and reusable tools, is hindered by local computational resource limitations, and does not offer widely accepted standards. One such "problem areas" is the analysis of Transposon Insertion Sequencing (TIS) data.

View Article and Find Full Text PDF

Fostering accessible online education using Galaxy as an e-learning platform.

Beatriz Serrano-Solano Melanie C Föll Cristóbal Gallardo-Alba Anika Erxleben Helena Rasche Anton Nekrutenko

PLoS Comput Biol

May 2021

The COVID-19 pandemic is shifting teaching to an online setting all over the world. The Galaxy framework facilitates the online learning process and makes it accessible by providing a library of high-quality community-curated training materials, enabling easy access to data and tools, and facilitates sharing achievements and progress between students and instructors. By combining Galaxy with robust communication channels, effective instruction can be designed inclusively, regardless of the students' environments.

View Article and Find Full Text PDF

Sequencing error profiles of Illumina sequencing instruments.

Nicholas Stoler Anton Nekrutenko

NAR Genom Bioinform

March 2021

Sequencing technology has achieved great advances in the past decade. Studies have previously shown the quality of specific instruments in controlled conditions. Here, we developed a method able to retroactively determine the error rate of most public sequencing datasets.

View Article and Find Full Text PDF

Freely accessible ready to use global infrastructure for SARS-CoV-2 monitoring.

Wolfgang Maier Simon Bray Marius van den Beek Dave Bouvier Nathaniel Coraor Anton Nekrutenko

bioRxiv

March 2021

The COVID-19 pandemic is the first global health crisis to occur in the age of big genomic data.Although data generation capacity is well established and sufficiently standardized, analytical capacity is not. To establish analytical capacity it is necessary to pull together global computational resources and deliver the best open source tools and analysis workflows within a ready to use, universally accessible resource.

View Article and Find Full Text PDF

Erratum: Increased yields of duplex sequencing data by a series of quality control tools.

Gundula Povysil Monika Heinzl Renato Salazar Nicholas Stoler Anton Nekrutenko

NAR Genom Bioinform

March 2021

[This corrects the article DOI: 10.1093/nargab/lqab002.].

View Article and Find Full Text PDF

Using Galaxy to Perform Large-Scale Interactive Data Analyses-An Update.

Alexander Ostrovsky Jennifer Hillman-Jackson Dave Bouvier Dave Clements Enis Afgan Anton Nekrutenko

Curr Protoc

February 2021

Article Synopsis

- Modern biology is increasingly reliant on computational methods to handle the large and complex datasets that are emerging, posing a challenge for experimental biologists who may lack computational skills.
- Galaxy is a web-based platform that provides access to a variety of computational biology tools and public biological data repositories, allowing users to blend private and public datasets.
- The article offers detailed protocols for using Galaxy to conduct specific biological analyses, including finding human coding exons, analyzing ChIP-seq data, comparing datasets, and working with RNA-seq.

View Article and Find Full Text PDF

Increased yields of duplex sequencing data by a series of quality control tools.

Gundula Povysil Monika Heinzl Renato Salazar Nicholas Stoler Anton Nekrutenko

NAR Genom Bioinform

March 2021

Duplex sequencing is currently the most reliable method to identify ultra-low frequency DNA variants by grouping sequence reads derived from the same DNA molecule into families with information on the forward and reverse strand. However, only a small proportion of reads are assembled into duplex consensus sequences (DCS), and reads with potentially valuable information are discarded at different steps of the bioinformatics pipeline, especially reads without a family. We developed a bioinformatics toolset that analyses the tag and family composition with the purpose to understand data loss and implement modifications to maximize the data output for the variant calling.

View Article and Find Full Text PDF

A single-cell RNA-sequencing training and analysis suite using the Galaxy framework.

Mehmet Tekman Bérénice Batut Alexander Ostrovsky Christophe Antoniewski Dave Clements Anton Nekrutenko

Gigascience

October 2020

Background: The vast ecosystem of single-cell RNA-sequencing tools has until recently been plagued by an excess of diverging analysis strategies, inconsistent file formats, and compatibility issues between different software suites. The uptake of 10x Genomics datasets has begun to calm this diversity, and the bioinformatics community leans once more towards the large computing requirements and the statistically driven methods needed to process and understand these ever-growing datasets.

Results: Here we outline several Galaxy workflows and learning resources for single-cell RNA-sequencing, with the aim of providing a comprehensive analysis environment paired with a thorough user learning experience that bridges the knowledge gap between the computational methods and the underlying cell biology.

View Article and Find Full Text PDF

No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics.

Dannon Baker Marius van den Beek Daniel Blankenberg Dave Bouvier John Chilton Anton Nekrutenko

PLoS Pathog

August 2020

The current state of much of the Wuhan pneumonia virus (severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2]) research shows a regrettable lack of data sharing and considerable analytical obfuscation. This impedes global research cooperation, which is essential for tackling public health emergencies and requires unimpeded access to data, analysis tools, and computational infrastructure. Here, we show that community efforts in developing open analytical software tools over the past 10 years, combined with national investments into scientific computational infrastructure, can overcome these deficiencies and provide an accessible platform for tackling global health emergencies in an open and transparent manner.

View Article and Find Full Text PDF

Corrigendum: The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2020 update.

Vahid Jalili Enis Afgan Qiang Gu Dave Clements Daniel Blankenberg Anton Nekrutenko

Nucleic Acids Res

August 2020

View Article and Find Full Text PDF