Merging of multi-string BWTs with applications.

Bioinformatics

Department of Computer Science, 201 S. Columbia St. UNC-CH, Chapel Hill, NC 27599, USA.

Published: December 2014

Motivation: The throughput of genomic sequencing has increased to the point that is overrunning the rate of downstream analysis. This, along with the desire to revisit old data, has led to a situation where large quantities of raw, and nearly impenetrable, sequence data are rapidly filling the hard drives of modern biology labs. These datasets can be compressed via a multi-string variant of the Burrows-Wheeler Transform (BWT), which provides the side benefit of searches for arbitrary k-mers within the raw data as well as the ability to reconstitute arbitrary reads as needed. We propose a method for merging such datasets for both increased compression and downstream analysis.

Results: We present a novel algorithm that merges multi-string BWTs in [Formula: see text] time where LCS is the length of their longest common substring between any of the inputs, and N is the total length of all inputs combined (number of symbols) using [Formula: see text] bits where F is the number of multi-string BWTs merged. This merged multi-string BWT is also shown to have a higher compressibility compared with the input multi-string BWTs separately. Additionally, we explore some uses of a merged multi-string BWT for bioinformatics applications.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4318930	PMC
http://dx.doi.org/10.1093/bioinformatics/btu584	DOI Listing

Publication Analysis

Top Keywords

multi-string bwts

[formula text]

merged multi-string

multi-string bwt

multi-string

merging multi-string

bwts

bwts applications

applications motivation

motivation throughput

Similar Publications

Merging of multi-string BWTs with applications.

Bioinformatics

December 2014

Department of Computer Science, 201 S. Columbia St. UNC-CH, Chapel Hill, NC 27599, USA.

James Holt Leonard McMillan

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!