In unsupervised learning literature, the study of clustering using microarray gene expression datasets has been extensively conducted with nonnegative matrix factorization (NMF), spectral clustering, kmeans, and gaussian mixture model (GMM)are some of the most used methods. However, there is still a limited number of works that utilize statistical analysis to measure the significances of performance differences between these methods. In this paper, statistical analysis of performance differences between ten NMF, six spectral clustering, four GMM, and the standard kmeans algorithms in clustering eleven publicly available microarray gene expression datasets with the number of clusters ranges from two to ten is presented. The experimental results show that statistically NMFs and kmeans have similar performances and outperform spectral clustering. As spectral clustering can be used to uncover hidden manifold structures, the underperformance of spectral methods leads us to question whether the datasets have manifold structures. Visual inspection using multidimensional scaling plots indicates that such structures do not exist. Moreover, as the plots indicate that clusters in some datasets have elliptical boundaries, GMM methods are also utilized. The experimental results show that GMM methods outperform the other methods to some degree, and thus imply that the datasets follow gaussian distributions.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TCBB.2020.3025486 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!