Modeling and visualizing uncertainty in gene expression clusters using dirichlet process mixtures.

IEEE/ACM Trans Comput Biol Bioinform

Department of Engineering, University of Cambridge, Trumpington Street, Cambridge, CB2 1PZ, UK.

Published: February 2010

Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data, little attention has been paid to uncertainty in the results obtained. Dirichlet process mixture (DPM) models provide a nonparametric Bayesian alternative to the bootstrap approach to modeling uncertainty in gene expression clustering. Most previously published applications of Bayesian model-based clustering methods have been to short time series data. In this paper, we present a case study of the application of nonparametric Bayesian clustering methods to the clustering of high-dimensional nontime series gene expression data using full Gaussian covariances. We use the probability that two genes belong to the same cluster in a DPM model as a measure of the similarity of these gene expression profiles. Conversely, this probability can be used to define a dissimilarity measure, which, for the purposes of visualization, can be input to one of the standard linkage algorithms used for hierarchical clustering. Biologically plausible results are obtained from the Rosetta compendium of expression profiles which extend previously published cluster analyses of this data.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCBB.2007.70269DOI Listing

Publication Analysis

Top Keywords

gene expression
20
clustering methods
12
uncertainty gene
8
dirichlet process
8
expression data
8
nonparametric bayesian
8
expression profiles
8
expression
6
clustering
6
gene
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!