Improved gap size estimation for scaffolding algorithms.

Bioinformatics

Department of Computational Biology, mKTH Royal Institute of Technology, Science for Life Laboratory, School of Computer Science and Communication, Solna, Sweden.

Published: September 2012

Motivation: One of the important steps of genome assembly is scaffolding, in which contigs are linked using information from read-pairs. Scaffolding provides estimates about the order, relative orientation and distance between contigs. We have found that contig distance estimates are generally strongly biased and based on false assumptions. Since erroneous distance estimates can mislead in subsequent analysis, it is important to provide unbiased estimation of contig distance.

Results: In this article, we show that state-of-the-art programs for scaffolding are using an incorrect model of gap size estimation. We discuss why current maximum likelihood estimators are biased and describe what different cases of bias we are facing. Furthermore, we provide a model for the distribution of reads that span a gap and derive the maximum likelihood equation for the gap length. We motivate why this estimate is sound and show empirically that it outperforms gap estimators in popular scaffolding programs. Our results have consequences both for scaffolding software, structural variation detection and for library insert-size estimation as is commonly performed by read aligners.

Availability: A reference implementation is provided at https://github.com/SciLifeLab/gapest.

Supplementary Information: Supplementary data are availible at Bioinformatics online.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/bts441DOI Listing

Publication Analysis

Top Keywords

gap size
8
size estimation
8
distance estimates
8
maximum likelihood
8
scaffolding
6
improved gap
4
estimation
4
estimation scaffolding
4
scaffolding algorithms
4
algorithms motivation
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!