Dramatic increases in data produced by next-generation sequencing (NGS) technologies demand data compression tools for saving storage space. However, effective and efficient data compression for genome sequencing data has remained an unresolved challenge in NGS data studies. In this paper, we propose a novel alignment-free and reference-free compression method, BdBG, which is the first to compress genome sequencing data with dynamic de Bruijn graphs based on the data after bucketing. Compared with existing de Bruijn graph methods, BdBG only stored a list of bucket indexes and bifurcations for the raw read sequences, and this feature can effectively reduce storage space. Experimental results on several genome sequencing datasets show the effectiveness of BdBG over three state-of-the-art methods. BdBG is written in python and it is an open source software distributed under the MIT license, available for download at https://github.com/rongjiewang/BdBG.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6197042PMC
http://dx.doi.org/10.7717/peerj.5611DOI Listing

Publication Analysis

Top Keywords

genome sequencing
16
sequencing data
12
data
8
data dynamic
8
dynamic bruijn
8
bruijn graphs
8
data compression
8
storage space
8
methods bdbg
8
bdbg
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!