Clustering and community detection provide a concise way of extracting meaningful information from large datasets. An ever growing plethora of data clustering and community detection algorithms have been proposed. In this paper, we address the question of ranking the performance of clustering algorithms for a given dataset. We show that, for hard clustering and community detection, Linsker's Infomax principle can be used to rank clustering algorithms. In brief, the algorithm that yields the highest value of the entropy of the partition, for a given number of clusters, is the best one. We show indeed, on a wide range of datasets of various sizes and topological structures, that the ranking provided by the entropy of the partition over a variety of partitioning algorithms is strongly correlated with the overlap with a ground truth partition The codes related to the project are available in https://github.com/Sandipan99/Ranking_cluster_algorithms.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7588117PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0239331PLOS

Publication Analysis

Top Keywords

clustering algorithms
12
clustering community
12
community detection
12
entropy partition
8
clustering
6
algorithms
5
unsupervised ranking
4
ranking clustering
4
algorithms infomax
4
infomax clustering
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!