We improve instability-based methods for the selection of the number of clusters in cluster analysis by developing a corrected clustering distance that corrects for the unwanted influence of the distribution of cluster sizes on cluster instability. We show that our corrected instability measure outperforms current instability-based measures across the whole sequence of possible , overcoming limitations of current insability-based methods for large . We also compare, for the first time, model-based and model-free approaches to determining cluster-instability and find their performance to be comparable. We make our method available in the R-package cstab.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7550318 | PMC |
http://dx.doi.org/10.1007/s00180-020-00981-5 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!