Publications by authors named "Anton Korobeynikov"

Motivation: Recent benchmarks of structural variant (SV) detection tools revealed that the majority of human genome structural variations (SVs), especially the medium-range (50-10,000 bp) SVs cannot be resolved with short-read sequencing, but long-read SV callers achieve great results on the same datasets. While improvements have been made, high-coverage long-read sequencing is associated with higher costs and input DNA requirements. To decrease the cost one can lower the sequence coverage, but the current long-read SV callers perform poorly with coverage below 10×.

View Article and Find Full Text PDF

Background: Recent advances in long-read sequencing technologies enabled accurate and contiguous assemblies of large genomes and metagenomes. However, even long and accurate high-fidelity (HiFi) reads do not resolve repeats that are longer than the read lengths. This limitation negatively affects the contiguity of diploid genome assemblies since two haplomes share many long identical regions.

View Article and Find Full Text PDF
Article Synopsis
  • Sequence-based analysis of the microbiomes in fermented foods and beverages sheds light on their effects on taste and health, using advanced metagenomics techniques for detailed profiling.
  • A new Hi-C metagenomics pipeline was developed for analyzing spontaneously fermented beers and ciders, resulting in improved genome reconstruction of the bacterial and yeast populations involved.
  • Findings revealed significant diversity in microbial communities, particularly in Lactobacillaceae for beers and Brettanomyces and Saccharomyces for ciders, highlighting potential health benefits and niche adaptations of these organisms within their environments.
View Article and Find Full Text PDF

A recently published article in BMCGenomics by Fuentes-Trillo et al. contains a comparison of assembly approaches of several noroviral samples via different tools and preprocessing strategies. It turned out that the study used outdated versions of tools as well as tools that were not designed for the viral assembly task.

View Article and Find Full Text PDF

The analysis of metagenomic data obtained via high-throughput DNA sequencing is primarily carried out by a dedicated binning process involving clustering contigs, presumably belonging to the same species. Here, we present a protocol for improving the quality of binning using BinSPreader. We describe steps for typical metagenome assembly and binning workflow.

View Article and Find Full Text PDF

While metagenome sequencing may provide insights on the genome sequences and composition of microbial communities, metatranscriptome analysis can be useful for studying the functional activity of a microbiome. RNA-Seq data provides the possibility to determine active genes in the community and how their expression levels depend on external conditions. Although the field of metatranscriptomics is relatively young, the number of projects related to metatranscriptome analysis increases every year and the scope of its applications expands.

View Article and Find Full Text PDF

Despite the recent advances in high-throughput sequencing, metagenome analysis of microbial populations still remains a challenge. In particular, the metagenome-assembled genomes (MAGs) are often fragmented due to interspecies repeats, uneven coverage, and varying strain abundance. MAGs are constructed via a binning process that uses features of input data in order to cluster long contigs presumably belonging to the same species.

View Article and Find Full Text PDF
Article Synopsis
  • Evaluating metagenomic software is crucial for enhancing the interpretation of metagenomes, and the CAMI II challenge focused on this by using complex datasets from numerous genomes and plasmids.
  • The analysis of 5,002 results from 76 software versions showed significant advancements in assembly, especially with long-read data, although challenges remained with related strains and genome recovery.
  • Findings indicated that while taxon profilers improved, they struggled with viruses and Archaea, highlighting the need for better reproducibility in clinical pathogen detection and guiding researchers in method selection based on efficiency and performance metrics.
View Article and Find Full Text PDF

Gut microbiome in critically ill patients shows profound dysbiosis. The most vulnerable is the subgroup of chronically critically ill (CCI) patients - those suffering from long-term dependence on support systems in intensive care units. It is important to investigate their microbiome as a potential reservoir of opportunistic taxa causing co-infections and a morbidity factor.

View Article and Find Full Text PDF
Article Synopsis
  • Public databases hold vast collections of nucleic acid sequences, exceeding 20 petabases, but efficient searching methods have been lacking.
  • The authors developed Serratus, a cloud computing system that allows for ultra-high-throughput sequence alignment, enabling them to search over 10 petabases of data and discover more than 10 new RNA viruses.
  • By characterizing these novel viruses and creating a free database, the study aims to enhance viral discovery and aid in understanding the origins of emerging pathogens, ultimately improving pandemic preparedness.
View Article and Find Full Text PDF

Microbial communities might include distinct lineages of closely related organisms that complicate metagenomic assembly and prevent the generation of complete metagenome-assembled genomes (MAGs). Here we show that deep sequencing using long (HiFi) reads combined with Hi-C binning can address this challenge even for complex microbial communities. Using existing methods, we sequenced the sheep fecal metagenome and identified 428 MAGs with more than 90% completeness, including 44 MAGs in single circular contigs.

View Article and Find Full Text PDF

The lack of control over the usage of antibiotics leads to propagation of the microbial strains that are resistant to many antimicrobial substances. This situation is an emerging threat to public health and therefore the development of approaches to infer the presence of resistant strains is a topic of high importance. The resistome construction of an isolate microbial species could be considered a solved task with many state-of-the-art tools available.

View Article and Find Full Text PDF

Microbial natural products are a major source of bioactive compounds for drug discovery. Among these molecules, nonribosomal peptides (NRPs) represent a diverse class of natural products that include antibiotics, immunosuppressants, and anticancer agents. Recent breakthroughs in natural product discovery have revealed the chemical structure of several thousand NRPs.

View Article and Find Full Text PDF

Motivation: The COVID-19 pandemic has ignited a broad scientific interest in viral research in general and coronavirus research in particular. The identification and characterization of viral species in natural reservoirs typically involves de novo assembly. However, existing genome, metagenome and transcriptome assemblers often are not able to assemble many viruses (including coronaviruses) into a single contig.

View Article and Find Full Text PDF

Metagenomics is a segment of conventional microbial genomics dedicated to the sequencing and analysis of combined genomic DNA of entire environmental samples. The most critical step of the metagenomic data analysis is the reconstruction of individual genes and genomes of the microorganisms in the communities using metagenomic assemblers - computational programs that put together small fragments of sequenced DNA generated by sequencing instruments. Here, we describe the challenges of metagenomic assembly, a wide spectrum of applications in which metagenomic assemblies were used to better understand the ecology and evolution of microbial ecosystems, and present one of the most efficient microbial assemblers, SPAdes that was upgraded to become applicable for metagenomics.

View Article and Find Full Text PDF

Background: Graph-based representation of genome assemblies has been recently used in different contexts - from improved reconstruction of plasmid sequences and refined analysis of metagenomic data to read error correction and reference-free haplotype reconstruction. While many of these applications heavily utilize the alignment of long nucleotide sequences to assembly graphs, first general-purpose software tools for finding such alignments have been released only recently and their deficiencies and limitations are yet to be discovered. Moreover, existing tools can not perform alignment of amino acid sequences, which could prove useful in various contexts - in particular the analysis of metagenomic sequencing data.

View Article and Find Full Text PDF

Background: Illumina paired-end reads are often used for 16S analysis in metagenomic studies. Since DNA fragment size is usually smaller than the sum of lengths of paired reads, reads can be merged for downstream analysis. In spite of development of several tools for merging of paired-end reads, poor quality at the 3' ends within the overlapping region prevents the accurate combining of significant portion of read pairs.

View Article and Find Full Text PDF

SPAdes-St. Petersburg genome Assembler-was originally developed for de novo assembly of genome sequencing data produced for cultivated microbial isolates and for single-cell genomic DNA sequencing. With time, the functionality of SPAdes was extended to enable assembly of IonTorrent data, as well as hybrid assembly from short and long reads (PacBio and Oxford Nanopore).

View Article and Find Full Text PDF
Article Synopsis
  • Ribosomally synthesized and post-translationally modified peptides (RiPPs) are significant natural products that include antibiotics and various bioactive compounds.
  • Current discovery methods for RiPPs are limited and ineffective at identifying unknown modifications in larger datasets.
  • MetaMiner is a new software tool that successfully identified 31 known and 7 unknown RiPPs from diverse microbial sources by analyzing millions of spectra from large genomic databases.
View Article and Find Full Text PDF

Predicting biosynthetic gene clusters (BGCs) is critically important for discovery of antibiotics and other natural products. While BGC prediction from complete genomes is a well-studied problem, predicting BGCs in fragmented genomic assemblies remains challenging. The existing BGC prediction tools often assume that each BGC is encoded within a single contig in the genome assembly, a condition that is violated for most sequenced microbial genomes where BGCs are often scattered through several contigs, making it difficult to reconstruct them.

View Article and Find Full Text PDF

Dozens of type A malyngamides, principally identified by a decorated six-membered cyclohexanone headgroup and methoxylated lyngbic acid tail, have been isolated over several decades. Their environmental sources include macro- and microbiotic organisms, including sea hares, red alga, and cyanobacterial assemblages, but the true producing organism has remained enigmatic. Many type A analogues display potent bioactivity in human-health related assays, spurring an interest in this molecular class and its biosynthetic pathway.

View Article and Find Full Text PDF

We describe fast and accurate algorithm for IonTorrent read error correction capable of significantly reducing the number of sequencing errors over the wide range of data sets. IonHammer is implemented in C++ and is freely available as part of the SPAdes genome assembler package.

View Article and Find Full Text PDF