Unlabelled: Inexpensive de novo genome sequencing, particularly in organisms with small genomes, is now possible using several new sequencing technologies. Some of these technologies such as that from Illumina's Solexa Sequencing, produce high genomic coverage by generating a very large number of small reads ( approximately 30 bp). While prior work shows that partial assembly can be performed by k-mer extension in error-free reads, this algorithm is unsuccessful with the sequencing error rates found in practice. We present VCAKE (Verified Consensus Assembly by K-mer Extension), a modification of simple k-mer extension that overcomes error by using high depth coverage. Though it is a simple modification of a previous approach, we show significant improvements in assembly results on simulated and experimental datasets that include error.

Availability: http://152.2.15.114/~labweb/VCAKE

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btm451DOI Listing

Publication Analysis

Top Keywords

k-mer extension
12
extending assembly
4
assembly short
4
short dna
4
dna sequences
4
sequences handle
4
handle error
4
error unlabelled
4
unlabelled inexpensive
4
inexpensive novo
4

Similar Publications

Background: Metagenomics is a powerful approach to study environmental and human-associated microbial communities and, in particular, the role of viruses in shaping them. Viral genomes are challenging to assemble from metagenomic samples due to their genomic diversity caused by high mutation rates. In the standard de Bruijn graph assemblers, this genomic diversity leads to complex k-mer assembly graphs with a plethora of loops and bulges that are challenging to resolve into strains or haplotypes because variants more than the k-mer size apart cannot be phased.

View Article and Find Full Text PDF

A problem extension of the longest common substring (LCS) between two texts is the enumeration of all LCSs given a minimum length (ALCS- ), along with their positions in each text. In bioinformatics, an efficient solution to the ALCS- for very long texts -genomes or metagenomes- can provide useful insights to discover genetic signatures responsible for biological mechanisms. The ALCS- problem has two additional requirements compared to the LCS problem: one is the minimum length , and the other is that all common strings longer than must be reported.

View Article and Find Full Text PDF

Genome sequencing for agriculturally important Rosaceous crops has made rapid progress both in completeness and annotation quality. Whole genome sequence and annotation gives breeders, researchers, and growers information about cultivar specific traits such as fruit quality and disease resistance, and informs strategies to enhance postharvest storage. Here we present a haplotype-phased, chromosomal level genome of Malus domestica, 'WA 38', a new apple cultivar released to market in 2017 as Cosmic Crisp®.

View Article and Find Full Text PDF

Seed-chain-extend with -mer seeds is a powerful heuristic technique for sequence alignment used by modern sequence aligners. Although effective in practice for both runtime and accuracy, theoretical guarantees on the resulting alignment do not exist for seed-chain-extend. In this work, we give the first rigorous bounds for the efficacy of seed-chain-extend with -mers Assume we are given a random nucleotide sequence of length ∼ that is indexed (or seeded) and a mutated substring of length ∼ ≤ with mutation rate θ < 0.

View Article and Find Full Text PDF

Genome Survey Sequencing and Genetic Background Characterization of Sims (Aquifoliaceae) Based on Next-Generation Sequencing.

Plants (Basel)

December 2022

Jiangsu Academy of Forestry, 109 Danyang Road, Dongshanqiao, Nanjing 211153, China.

Sims. is an evergreen arbor species with high ornamental and medicinal value that is widely distributed in China. However, there is a lack of molecular and genomic data for this plant, which severely restricts the development of its relevant research.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!