Optimal assembly for high throughput shotgun sequencing.

BMC Bioinformatics

Department of EECS, UC Berkeley, California, USA.

Published: December 2013

We present a framework for the design of optimal assembly algorithms for shotgun sequencing under the criterion of complete reconstruction. We derive a lower bound on the read length and the coverage depth required for reconstruction in terms of the repeat statistics of the genome. Building on earlier works, we design a de Brujin graph based assembly algorithm which can achieve very close to the lower bound for repeat statistics of a wide range of sequenced genomes, including the GAGE datasets. The results are based on a set of necessary and sufficient conditions on the DNA sequence and the reads for reconstruction. The conditions can be viewed as the shotgun sequencing analogue of Ukkonen-Pevzner's necessary and sufficient conditions for Sequencing by Hybridization.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3706340PMC
http://dx.doi.org/10.1186/1471-2105-14-S5-S18DOI Listing

Publication Analysis

Top Keywords

shotgun sequencing
12
optimal assembly
8
lower bound
8
repeat statistics
8
sufficient conditions
8
assembly high
4
high throughput
4
throughput shotgun
4
sequencing
4
sequencing framework
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!