Use of runs statistics for pattern recognition in genomic DNA sequences.

J Comput Biol

Epidemiology Section, Cancer Etiology Program, Cancer Research Center of Hawaii, University of Hawaii, Honolulu, HI 96813-2479, USA.

Published: October 2004

In this article, the use of the finite Markov chain imbedding (FMCI) technique to study patterns in DNA under a hidden Markov model (HMM) is introduced. With a vision of studying multiple runs-related statistics simultaneously under an HMM through the FMCI technique, this work establishes an investigation of a bivariate runs statistic under a binary HMM for DNA pattern recognition. An FMCI-based recursive algorithm is derived and implemented for the determination of the exact distribution of this bivariate runs statistic under an independent identically distributed (IID) framework, a Markov chain (MC) framework, and a binary HMM framework. With this algorithm, we have studied the distributions of the bivariate runs statistic under different binary HMM parameter sets; probabilistic profiles of runs are created and shown to be useful for trapping HMM maximum likelihood estimates (MLEs). This MLE-trapping scheme offers good initial estimates to jump-start the expectation-maximization (EM) algorithm in HMM parameter estimation and helps prevent the EM estimates from landing on a local maximum or a saddle point. Applications of the bivariate runs statistic and the probabilistic profiles in conjunction with binary HMMs for pattern recognition in genomic DNA sequences are illustrated via case studies on DNA bendability signals using human DNA data.

Download full-text PDF

Source
http://dx.doi.org/10.1089/106652704773416911DOI Listing

Publication Analysis

Top Keywords

bivariate runs
16
runs statistic
16
pattern recognition
12
binary hmm
12
recognition genomic
8
genomic dna
8
dna sequences
8
markov chain
8
fmci technique
8
statistic binary
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!