HapZipper: sharing HapMap populations just got easier.

Nucleic Acids Res

Department of Biomedical Engineering, Johns Hopkins University, High Throughput Biology Center, Johns Hopkins University School of Medicine, McKusick-Nathans Institute of Genetic Medicine, Baltimore, MD 21205, USA.

Published: November 2012

The rapidly growing amount of genomic sequence data being generated and made publicly available necessitate the development of new data storage and archiving methods. The vast amount of data being shared and manipulated also create new challenges for network resources. Thus, developing advanced data compression techniques is becoming an integral part of data production and analysis. The HapMap project is one of the largest public resources of human single-nucleotide polymorphisms (SNPs), characterizing over 3 million SNPs genotyped in over 1000 individuals. The standard format and biological properties of HapMap data suggest that a dedicated genetic compression method can outperform generic compression tools. We propose a compression methodology for genetic data by introducing HapZipper, a lossless compression tool tailored to compress HapMap data beyond benchmarks defined by generic tools such as gzip, bzip2 and lzma. We demonstrate the usefulness of HapZipper by compressing HapMap 3 populations to <5% of their original sizes. HapZipper is freely downloadable from https://bitbucket.org/pchanda/hapzipper/downloads/HapZipper.tar.bz2.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3488212PMC
http://dx.doi.org/10.1093/nar/gks709DOI Listing

Publication Analysis

Top Keywords

hapmap populations
8
data
8
hapmap data
8
hapmap
5
compression
5
hapzipper sharing
4
sharing hapmap
4
populations easier
4
easier rapidly
4
rapidly growing
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!