The increasingly widespread application of next-generation sequencing (NGS) in clinical diagnostics and epidemiological research has generated a demand for robust, fast, automated, and user-friendly bioinformatics workflows. To guide the choice of tools for the assembly of full-length viral genomes from NGS datasets, we assessed the performance and applicability of four open-source bioinformatics pipelines (shiver-for which we created a user-friendly Dockerized version, referred to as dshiver; SmaltAlign; viral-ngs; and V-pipe) using both simulated and real-world HIV-1 paired-end short-read datasets and default settings. All four pipelines produced consensus genome assemblies with high quality metrics (genome fraction recovery, mismatch and indel rates, variant calling F1 scores) when the reference sequence used for assembly had high similarity to the analyzed sample. The shiver and SmaltAlign pipelines (but not viral-ngs and V-Pipe) also showed robust performance with more divergent samples (non-matching subtypes). With empirical datasets, SmaltAlign and viral-ngs exhibited an order of magnitude shorter runtime compared to V-Pipe and shiver. In terms of applicability, V-Pipe provides the broadest functionalities, SmaltAlign and dshiver combine user-friendliness with robustness, while the use of viral-ngs requires less computational resources compared to other pipelines. In conclusion, if a closely matched reference sequence is available, all pipelines can reliably reconstruct viral consensus genomes; therefore, differences in user-friendliness and runtime may guide the choice of the pipeline in a particular setting. If a matched reference sequence cannot be selected, we recommend shiver or SmaltAlign for robust performance. The new Dockerized version of shiver offers ease of use in addition to the accuracy and robustness of the original pipeline.

Download full-text PDF

Source
http://dx.doi.org/10.3390/v16121824DOI Listing

Publication Analysis

Top Keywords

reference sequence
12
open-source bioinformatics
8
bioinformatics pipelines
8
full-length viral
8
guide choice
8
dockerized version
8
smaltalign viral-ngs
8
viral-ngs v-pipe
8
shiver smaltalign
8
robust performance
8

Similar Publications

Introduction: (MG) infection is a primary cause of chronic respiratory disease in poultry, threatening the economic viability of China's goose-farming industry. This study investigated the pathogenicity and drug resistance of an MG strain isolated from geese and whole-genome sequenced the strain.

Material And Methods: A strain designated MG-GD01/22 was isolated from the air-sac tissues of five geese with chronic respiratory disease on a Guangdong goose farm.

View Article and Find Full Text PDF

The first mitogenome report of Zimmer 1921 (Malacostraca: Cumacea).

Mitochondrial DNA B Resour

January 2025

Department of Science Education, Ewha Womans University, Seoul, South Korea.

In 1921, Zimmer established the genus for from Japanese waters. This study determined the first complete mitogenome of hooded shrimp sequenced from (Cumacea: Diastylidae). is a type species of the genus , distributed in the West Pacific from southern Kuril to Vietnam, including Korean waters.

View Article and Find Full Text PDF

Biting midges ( spp.) are important vectors of several insect borne arboviruses but are underrepresented in terms of availability of high-resolution genomic resources. We assembled and annotated complete mitochondrial genomes for two species, namely and which are proven vectors for Bluetongue Virus (BTV).

View Article and Find Full Text PDF

The Candida Genome Database (CGD; www.candidagenome.org) is unique in being both a model organism database and a fungal pathogen database.

View Article and Find Full Text PDF

Evaluation of MALDI-TOF for identification of Vibrio cholerae and Vibrio parahaemolyticus from growth on agar media.

Appl Microbiol Biotechnol

January 2025

Vibrio Reference Laboratory, Bureau of Microbial Hazards, Health Canada, Ottawa, ON, Canada.

Two methods were compared for their ability to accurately identify Vibrio species of interest: whole genome sequencing as the reference method and MALDI-TOF MS (matrix-assisted laser desorption/ionization-time of flight mass spectrometry) proteome fingerprinting. The accuracy of mass spectrometry-based identification method was evaluated for its ability to accurately identify isolates of Vibrio cholerae and Vibrio parahaemolyticus. Identification result of each isolate obtained by mass spectrometry was compared to identification by whole genome sequencing (WGS).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!