Clustering time-course gene expression data (gene trajectories) is an important step towards solving the complex problem of gene regulatory network modeling and discovery as it significantly reduces the dimensionality of the gene space required for analysis. Traditional clustering methods that perform hill-climbing from randomly initialized cluster centers are prone to produce inconsistent and sub-optimal cluster solutions over different runs. This paper introduces a novel method that hybridizes genetic algorithm (GA) and expectation maximization algorithms (EM) for clustering gene trajectories with the mixtures of multiple linear regression models (MLRs), with the objective of improving the global optimality and consistency of the clustering performance. The proposed method is applied to cluster the human fibroblasts and the yeast time-course gene expression data based on their trajectory similarities. It outperforms the standard EM method significantly in terms of both clustering accuracy and consistency. The biological implications of the improved clustering performance are demonstrated.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1142/s0219720005001478 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!