Clustering benchmark datasets exploiting the fundamental clustering problems.

Data Brief

Databionics Research Group, Philipps-University of Marburg, Hans-Meerwein-Straße 6, D-35032 Marburg, Germany.

Published: June 2020

The Fundamental Clustering Problems Suite (FCPS) offers a variety of clustering challenges that any algorithm should be able to handle given real-world data. The FCPS consists of datasets with known a priori classifications that are to be reproduced by the algorithm. The datasets are intentionally created to be visualized in two or three dimensions under the hypothesis that objects can be grouped unambiguously by the human eye. Each dataset represents a certain problem that can be solved by known clustering algorithms with varying success. In the R package "Fundamental Clustering Problems Suite" on CRAN, user-defined sample sizes can be drawn for the FCPS. Additionally, the distances of two high-dimensional datasets called Leukemia and Tetragonula are provided here. This collection is useful for investigating the shortcomings of clustering algorithms and the limitations of dimensionality reduction methods in the case of three-dimensional or higher datasets. This article is a simultaneous co-submission with Swarm Intelligence for Self-Organized Clustering [1].

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7195520PMC
http://dx.doi.org/10.1016/j.dib.2020.105501DOI Listing

Publication Analysis

Top Keywords

clustering problems
12
clustering
8
fundamental clustering
8
clustering algorithms
8
datasets
5
clustering benchmark
4
benchmark datasets
4
datasets exploiting
4
exploiting fundamental
4
problems fundamental
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!