Bayesian models and Markov chain Monte Carlo methods for protein motifs with the secondary characteristics.

J Comput Biol

Department of Statistics, Purdue University, 150 N. University Street, West Lafayette, IN 47907-2067, USA.

Published: September 2005

Statistical methods have been developed for finding local patterns, also called motifs, in multiple protein sequences. The aligned segments may imply functional or structural core regions. However, the existing methods often have difficulties in aligning multiple proteins when sequence residue identities are low (e.g., less than 25%). In this article, we develop a Bayesian model and Markov chain Monte Carlo (MCMC) methods for identifying subtle motifs in protein sequences. Specifically, a motif is defined not only in terms of specific sites characterized by amino acid frequency vectors, but also as a combination of secondary characteristics such as hydrophobicity, polarity, etc. Markov chain Monte Carlo methods are proposed to search for a motif pattern with high posterior probability under the new model. A special MCMC algorithm is developed, involving transitions between state spaces of different dimensions. The proposed methods were supported by a simulated study. It was then tested by two real datasets, including a group of helix-turn-helix proteins, and one set from the CATH Protein Structure Classification Database. Statistical comparisons showed that the new approach worked better than a typical Gibbs sampling approach which is based only on an amino acid model.

Download full-text PDF

Source
http://dx.doi.org/10.1089/cmb.2005.12.952DOI Listing

Publication Analysis

Top Keywords

markov chain
12
chain monte
12
monte carlo
12
carlo methods
8
secondary characteristics
8
protein sequences
8
amino acid
8
methods
6
bayesian models
4
models markov
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!