Zerone: a ChIP-seq discretizer for multiple replicates with built-in quality control.

Bioinformatics

Genome Architecture, Gene Regulation, Stem Cells and Cancer Programme, Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Barcelona 08003, Spain Universitat Pompeu Fabra (UPF), Barcelona, Spain.

Published: October 2016

Motivation: Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is the standard method to investigate chromatin protein composition. As the number of community-available ChIP-seq profiles increases, it becomes more common to use data from different sources, which makes joint analysis challenging. Issues such as lack of reproducibility, heterogeneous quality and conflicts between replicates become evident when comparing datasets, especially when they are produced by different laboratories.

Results: Here, we present Zerone, a ChIP-seq discretizer with built-in quality control. Zerone is powered by a Hidden Markov Model with zero-inflated negative multinomial emissions, which allows it to merge several replicates into a single discretized profile. To identify low quality or irreproducible data, we trained a Support Vector Machine and integrated it as part of the discretization process. The result is a classifier reaching 95% accuracy in detecting low quality profiles. We also introduce a graphical representation to compare discretization quality and we show that Zerone achieves outstanding accuracy. Finally, on current hardware, Zerone discretizes a ChIP-seq experiment on mammalian genomes in about 5 min using less than 700 MB of memory.

Availability And Implementation: Zerone is available as a command line tool and as an R package. The C source code and R scripts can be downloaded from https://github.com/nanakiksc/zerone The information to reproduce the benchmark and the figures is stored in a public Docker image that can be downloaded from https://hub.docker.com/r/nanakiksc/zerone/

Contact: : guillaume.filion@gmail.com

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5039920PMC
http://dx.doi.org/10.1093/bioinformatics/btw336DOI Listing

Publication Analysis

Top Keywords

zerone chip-seq
8
chip-seq discretizer
8
built-in quality
8
quality control
8
low quality
8
zerone
6
quality
6
discretizer multiple
4
multiple replicates
4
replicates built-in
4

Similar Publications

Zerone: a ChIP-seq discretizer for multiple replicates with built-in quality control.

Bioinformatics

October 2016

Genome Architecture, Gene Regulation, Stem Cells and Cancer Programme, Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Barcelona 08003, Spain Universitat Pompeu Fabra (UPF), Barcelona, Spain.

Motivation: Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is the standard method to investigate chromatin protein composition. As the number of community-available ChIP-seq profiles increases, it becomes more common to use data from different sources, which makes joint analysis challenging. Issues such as lack of reproducibility, heterogeneous quality and conflicts between replicates become evident when comparing datasets, especially when they are produced by different laboratories.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!