The different measurement techniques that interrogate biological systems provide means for monitoring the behavior of virtually all cell components at different scales and from complementary angles. However, data generated in these experiments are difficult to interpret. A first difficulty arises from high-dimensionality and inherent noise of such data. Organizing them into meaningful groups is then highly desirable to improve our knowledge of biological mechanisms. A more accurate picture can be obtained when accounting for dependencies between components (e.g., genes) under study. A second difficulty arises from the fact that biological experiments often produce missing values. When it is not ignored, the latter issue has been solved by imputing the expression matrix prior to applying traditional analysis methods. Although helpful, this practice can lead to unsound results. We propose in this paper a statistical methodology that integrates individual dependencies in a missing data framework. More explicitly, we present a clustering algorithm dealing with incomplete data in a Hidden Markov Random Field context. This tackles the missing value issue in a probabilistic framework and still allows us to reconstruct missing observations a posteriori without imposing any pre-processing of the data. Experiments on synthetic data validate the gain in using our method, and analysis of real biological data shows its potential to extract biological knowledge.

Download full-text PDF

Source
http://dx.doi.org/10.1089/cmb.2008.0078DOI Listing

Publication Analysis

Top Keywords

markov random
8
random field
8
difficulty arises
8
data
7
missing
5
biological
5
model-based approach
4
approach gene
4
gene clustering
4
clustering missing
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!