Canu: scalable and accurate long-read assembly via adaptive -mer weighting and repeat separation.

Genome Res

Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA.

Published: May 2017

AI Article Synopsis

  • Long-read single-molecule sequencing enables better genome assembly but struggles with high error rates for large repeats and haplotypes.
  • Canu improves upon previous technologies like Celera Assembler by enhancing support for nanopore sequencing, reducing the coverage needed for assembly, and significantly decreasing assembly time for large genomes.
  • With advanced overlapping and assembly algorithms, Canu demonstrates reliable assembly of complete microbial genomes and near-complete eukaryotic chromosomes, providing outputs that can integrate with other genome analysis techniques.

Article Abstract

Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new overlapping and assembly algorithms, including an adaptive overlapping strategy based on weighted MinHash and a sparse assembly graph construction that avoids collapsing diverged repeats and haplotypes. We demonstrate that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences (PacBio) or Oxford Nanopore technologies and achieves a contig NG50 of >21 Mbp on both human and PacBio data sets. For assembly structures that cannot be linearly represented, Canu provides graph-based assembly outputs in graphical fragment assembly (GFA) format for analysis or integration with complementary phasing and scaffolding techniques. The combination of such highly resolved assembly graphs with long-range scaffolding information promises the complete and automated assembly of complex genomes.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5411767PMC
http://dx.doi.org/10.1101/gr.215087.116DOI Listing

Publication Analysis

Top Keywords

assembly
11
celera assembler
8
canu
5
canu scalable
4
scalable accurate
4
accurate long-read
4
long-read assembly
4
assembly adaptive
4
adaptive -mer
4
-mer weighting
4

Similar Publications

Article Synopsis
  • The negative symptoms of schizophrenia, like lack of emotion and motivation, are hard to treat and significantly impact daily functioning.
  • This review highlights current research on treatment options for these symptoms, categorizing them into different types and evaluating various assessment scales.
  • Although no treatments are conclusively proven as the best for these symptoms, some off-label and investigational medications show promise, including cariprazine and memantine, and further research is needed to explore new therapeutic possibilities.
View Article and Find Full Text PDF

This study presents the first chromosome-level genome assembly of the Korean long-tailed chicken (KLC), a unique breed of Gallus gallus known as Ginkkoridak. Our assembly achieved a super contig N50 of 5.7 Mbp and a scaffold N50 exceeding 90 Mb, with a genome completeness of 96.

View Article and Find Full Text PDF

Chromosome-level genome assembly, annotation, and population genomic resource of argali (Ovis ammon).

Sci Data

January 2025

Key Laboratory of Ecological Safety and Sustainable Development in Arid Lands, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi, 830011, China.

Argali stands as the largest species among wild sheep in Central and East Asia, with a concerning rate of decline estimated at 30%. The intraspecific taxonomy of argali remains contentious due to limited genomic data and unclear geographic separation. In this study, we constructed a chromosome-level genome assembly and annotation for the Tibetan argali (O.

View Article and Find Full Text PDF

Chromosome-scale genome assembly of three-spotted seahorse (Hippocampus trimaculatus) with a unique karyotype.

Sci Data

January 2025

Laboratory of Aquatic Genomics, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen, 518057, China.

Three-spotted seahorse (Hippocampi trimaculata) is a unique fish with important economic and medicinal values, and its total chromosome number is potentially quite different from other seahorse species. Herein, we constructed a chromosome-level genome assembly for this special seahorse by integration of MGI short-read, PacBio HiFi long-read and Hi-C sequencing techniques. A 416.

View Article and Find Full Text PDF

The Southern Ground Hornbill (SGH - Bucorvus leadbeateri) is one of the largest hornbill species worldwide, known for its complex social structures and breeding behaviours. This bird has been of great interest due to its declining population and disappearance from historic ranges in southern Africa. Despite being the focus of numerous conservation efforts, with research forming an integral part of these initiatives, there is still a substantial lack of knowledge regarding the molecular biology aspects of this bird species.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!