Transcription factors play a key role in gene regulation by interacting with specific binding sites or motifs. Therefore, enrichment of binding motifs is important for genome annotation and efficient computation of the statistical significance, the p-value, of the enrichment of motifs is crucial. We propose an efficient approximation to compute the significance. Due to the incorporation of both strands of the DNA molecules and explicit modeling of dependencies between overlapping hits, we achieve accurate results for any DNA motif based on its Position Frequency Matrix (PFM) representation. The accuracy of the p-value approximation is shown by comparison with the simulated count distribution. Furthermore, we compare the approach with a binomial approximation, (compound) Poisson approximation, and a normal approximation. In general, our approach outperforms these approximations or is equally good but significantly faster. An implementation of our approach is available at http://mosta.molgen.mpg.de.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2607244PMC
http://dx.doi.org/10.1089/cmb.2007.0084DOI Listing

Publication Analysis

Top Keywords

compound poisson
8
poisson approximation
8
position frequency
8
frequency matrix
8
matrix pfm
8
approximation
6
approximation number
4
number occurrences
4
occurrences position
4
pfm strands
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!