Genome assembly quality: assessment and improvement using the neutral indel model.

Genome Res

Medical Research Council Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3QX, United Kingdom.

Published: May 2010

We describe a statistical and comparative-genomic approach for quantifying error rates of genome sequence assemblies. The method exploits not substitutions but the pattern of insertions and deletions (indels) in genome-scale alignments for closely related species. Using two- or three-way alignments, the approach estimates the amount of aligned sequence containing clusters of nucleotides that were wrongly inserted or deleted during sequencing or assembly. Thus, the method is well-suited to assessing fine-scale sequence quality within single assemblies, between different assemblies of a single set of reads, and between genome assemblies for different species. When applying this approach to four primate genome assemblies, we found that average gap error rates per base varied considerably, by up to sixfold. As expected, bacterial artificial chromosome (BAC) sequences contained lower, but still substantial, predicted numbers of errors, arguing for caution in regarding BACs as the epitome of genome fidelity. We then mapped short reads, at approximately 10-fold statistical coverage, from a Bornean orangutan onto the Sumatran orangutan genome assembly originally constructed from capillary reads. This resulted in a reduced gap error rate and a separation of error-prone from high-fidelity sequence. Over 5000 predicted indel errors in protein-coding sequence were corrected in a hybrid assembly. Our approach contributes a new fine-scale quality metric for assemblies that should facilitate development of improved genome sequencing and assembly strategies.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2860169PMC
http://dx.doi.org/10.1101/gr.096966.109DOI Listing

Publication Analysis

Top Keywords

genome assembly
8
error rates
8
sequencing assembly
8
genome assemblies
8
gap error
8
genome
7
assemblies
6
sequence
5
assembly quality
4
quality assessment
4

Similar Publications

Background: During mammalian spermatogenesis, the cytoskeleton system plays a significant role in morphological changes. Male infertility such as non-obstructive azoospermia (NOA) might be explained by studies of the cytoskeletal system during spermatogenesis.

Methods: The cytoskeleton, scaffold, and actin-binding genes were analyzed by microarray and bioinformatics (771 spermatogenic cellsgenes and 774 Sertoli cell genes).

View Article and Find Full Text PDF

Black carp (Mylopharyngodon piceus) is one of the "four famous domestic fishes" in China and an important economic fish in freshwater aquaculture. A high-quality genome is essential for advancing future biological research and breeding programs for this species. In this study, we aimed to generate a high-quality chromosome-level genome assembly of black carp using Nanopore and Hi-C technologies.

View Article and Find Full Text PDF

The 40S ribosomal subunit recycling pathway is an integral link in the cellular quality control network, occurring after translational errors have been corrected by the ribosome-associated quality control (RQC) machinery. Despite our understanding of its role, the impact of translation quality control on cellular metabolism remains poorly understood. Here, we reveal a conserved role of the 40S ribosomal subunit recycling (USP10-G3BP1) complex in regulating mitochondrial dynamics and function.

View Article and Find Full Text PDF

Expression and characterization of the complete cyanophage genome PP in the heterologous host Synechococcus elongatus PCC 7942.

Int J Biol Macromol

January 2025

School of Chemical Engineering & Technology, Tianjin University, Tianjin 300072, PR China; Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, PR China; Frontier Science Center for Synthetic Biology (Ministry of Education), Tianjin University, Tianjin 300072, PR China. Electronic address:

In this study, we successfully integrated the full-length genome of the cyanophage PP into the non-host cyanobacterium Synechococcus elongatus PCC 7942, facilitated by conjugation via Escherichia coli. To address the challenge posed by the toxic open reading frames (ORFs) of PP in E. coli, we first identified and characterized three toxic ORFs.

View Article and Find Full Text PDF

Chromosome-level de novo genome unveils the evolution of Gleditsia sinensis and thorns development.

Genomics

January 2025

State Key Laboratory of Tree Genetics and Breeding, Laboratory of Forest Silviculture and Tree Cultivation, Research Institute of Forestry, Chinese Academy of Forestry, China. Electronic address:

Gleditsia sinensis Lam. (G. sinensis) as an important species within the Leguminosae family, has been utilized in Chinese medicine for centuries, and its thorns serve as a chief medicinal ingredient.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!