High-quality genome assemblies are crucial to many biological studies, and utilizing long sequencing reads can help achieve higher assembly contiguity. While long reads can resolve complex and repetitive regions of a genome, their relatively high associated error rates are still a major limitation. Long reads generally produce draft genome assemblies with lower base quality, which must be corrected with a genome polishing step. Hybrid genome polishing solutions can greatly improve the quality of long-read genome assemblies by utilizing more accurate short reads to validate bases and correct errors. Currently available hybrid polishing methods rely on read alignments, and are therefore memory-intensive and do not scale well to large genomes. Here we describe ntEdit+Sealer, an alignment-free, k-mer-based genome finishing protocol that employs memory-efficient Bloom filters. The protocol includes ntEdit for correcting base errors and small indels, and for marking potentially problematic regions, then Sealer for filling both assembly gaps and problematic regions flagged by ntEdit. ntEdit+Sealer produces highly accurate, error-corrected genome assemblies, and is available as a Makefile pipeline from https://github.com/bcgsc/ntedit_sealer_protocol. © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol: Automated long-read genome finishing with short reads Support Protocol: Selecting optimal values for k-mer lengths (k) and Bloom filter size (b).

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9196995PMC
http://dx.doi.org/10.1002/cpz1.442DOI Listing

Publication Analysis

Top Keywords

genome assemblies
20
long-read genome
12
genome
10
long reads
8
genome polishing
8
short reads
8
genome finishing
8
problematic regions
8
assemblies
5
reads
5

Similar Publications

Population structure and genetic diversity of Toona sinensis revealed by whole-genome resequencing.

BMC Genom Data

January 2025

Key Laboratory of State Forestry and Grassland Administration Conservation and Utilization of Warm Temperate Zone Forest and Grass Germplasm Resources, Shandong Provincial Center of Forest and Grass Germplasm Resources, Ji'nan, 250103, Shandong, China.

Objectives: Toona sinensis, commonly known as Chinese toon, is a perennial woody plant with significant economic and ecological importance. This study employed whole-genome resequencing of 180 T. sinensis samples collected from Shandong to analyze genetic variation and diversity, ultimately identifying 18,231 high-quality SNPs after rigorous quality control and linkage disequilibrium pruning.

View Article and Find Full Text PDF

Background: Tea-oil Camellia within the genus Camellia is renowned for its premium Camellia oil, often described as "Oriental olive oil". So far, only one partial mitochondrial genomes of Tea-oil Camellia have been published (no main Tea-oil Camellia cultivars), and comparative mitochondrial genomic studies of Camellia remain limited.

Results: In this study, we first reconstructed the entire mitochondrial genome of C.

View Article and Find Full Text PDF

Background: The advent of next generation sequencing technologies has enabled a surge in the number of whole genome sequences in public databases, and our understanding of the composition and evolution of bacterial genomes. Besides model organisms and pathogens, some attention has been dedicated to industrial bacteria, notably members of the Lactobacillaceae family that are commonly studied and formulated as probiotic bacteria. Of particular interest is Lactobacillus acidophilus NCFM, an extensively studied strain that has been widely commercialized for decades and is being used for the delivery of vaccines and therapeutics.

View Article and Find Full Text PDF

The highly allo-autopolyploid modern sugarcane genome and very recent allopolyploidization in Saccharum.

Nat Genet

January 2025

Center for Genomics, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou, China.

Modern sugarcane, a highly allo-autopolyploid organism, has a very complex genome. In the present study, the karyotype and genome architecture of modern sugarcane were investigated, resulting in a genome assembly of 97 chromosomes (8.84 Gb).

View Article and Find Full Text PDF

Nucleosome is the basic structural unit of the genome. During processes like DNA replication and gene transcription, the conformation of nucleosomes undergoes dynamic changes, including DNA unwrapping and rewrapping, as well as histone disassembly and assembly. However, the wrapping characteristics of nucleosomes across the entire genome, including region-specificity and their correlation with higher-order chromatin organization, remains to be studied.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!