Background: Assembly algorithm choice should be a deliberate, well-justified decision when researchers create genome assemblies for eukaryotic organisms from third-generation sequencing technologies. While third-generation sequencing by Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) has overcome the disadvantages of short read lengths specific to next-generation sequencing (NGS), third-generation sequencers are known to produce more error-prone reads, thereby generating a new set of challenges for assembly algorithms and pipelines. However, the introduction of HiFi reads, which offer substantially reduced error rates, has provided a promising solution for more accurate assembly outcomes. Since the introduction of third-generation sequencing technologies, many tools have been developed that aim to take advantage of the longer reads, and researchers need to choose the correct assembler for their projects.
Results: We benchmarked state-of-the-art long-read de novo assemblers to help readers make a balanced choice for the assembly of eukaryotes. To this end, we used 12 real and 64 simulated datasets from different eukaryotic genomes, with different read length distributions, imitating PacBio continuous long-read (CLR), PacBio high-fidelity (HiFi), and ONT sequencing to evaluate the assemblers. We include 5 commonly used long-read assemblers in our benchmark: Canu, Flye, Miniasm, Raven, and wtdbg2 for ONT and PacBio CLR reads. For PacBio HiFi reads , we include 5 state-of-the-art HiFi assemblers: HiCanu, Flye, Hifiasm, LJA, and MBG. Evaluation categories address the following metrics: reference-based metrics, assembly statistics, misassembly count, BUSCO completeness, runtime, and RAM usage. Additionally, we investigated the effect of increased read length on the quality of the assemblies and report that read length can, but does not always, positively impact assembly quality.
Conclusions: Our benchmark concludes that there is no assembler that performs the best in all the evaluation categories. However, our results show that overall Flye is the best-performing assembler for PacBio CLR and ONT reads, both on real and simulated data. Meanwhile, best-performing PacBio HiFi assemblers are Hifiasm and LJA. Next, the benchmarking using longer reads shows that the increased read length improves assembly quality, but the extent to which that can be achieved depends on the size and complexity of the reference genome.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10673639 | PMC |
http://dx.doi.org/10.1093/gigascience/giad100 | DOI Listing |
J Eukaryot Microbiol
January 2025
Limnological Station, Department of Plant and Microbial Biology, University of Zurich, Kilchberg, Switzerland.
The globally distributed ciliate Balanion planctonicum is a primary consumer of phytoplankton spring blooms. Due to its small size (~20 μm), identification and quantification by molecular tools is preferable as an alternative to the laborious counting of specimen in quantitative protargol stains. However, previous sequencing of the 18S rDNA V9 region of B.
View Article and Find Full Text PDFUnlabelled: To overcome the paucity of known tumor-specific surface antigens in pediatric high-grade glioma (pHGG), we contrasted splicing patterns in pHGGs and normal brain samples. Among alternative splicing events affecting extracellular protein domains, the most pervasive alteration was the skipping of ≤30 nucleotide-long microexons. Several of these skipped microexons mapped to L1-IgCAM family members, such as .
View Article and Find Full Text PDFTandem repeats are a highly polymorphic class of genomic variation that play causal roles in rare diseases but are notoriously difficult to sequence using short-read techniques . Most previous studies profiling tandem repeats genome-wide have reduced the description of each locus to the singular value of the length of the entire repetitive locus . Here we introduce a comprehensive database of 3.
View Article and Find Full Text PDFFront Plant Sci
January 2025
Bio-resource Research and Utilization Joint Key Laboratory of Sichuan and Chongqing, Chongqing Institute of Medicinal Plant Cultivation, Nanchuan, Chongqing, China.
Introduction: Mitochondria are essential organelles that provide energy for plants. They are semi-autonomous, maternally inherited, and closely linked to cytoplasmic male sterility (CMS) in plants. , a widely used medicinal plant from the Caprifoliaceae family, is rich in chlorogenic acid (CGA) and its analogues, which are known for their antiviral and anticancer properties.
View Article and Find Full Text PDFInt J Biol Macromol
January 2025
School of Chemical Engineering & Technology, Tianjin University, Tianjin 300072, PR China; Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, PR China; Frontier Science Center for Synthetic Biology (Ministry of Education), Tianjin University, Tianjin 300072, PR China. Electronic address:
In this study, we successfully integrated the full-length genome of the cyanophage PP into the non-host cyanobacterium Synechococcus elongatus PCC 7942, facilitated by conjugation via Escherichia coli. To address the challenge posed by the toxic open reading frames (ORFs) of PP in E. coli, we first identified and characterized three toxic ORFs.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!