A unified view of density-based methods for semi-supervised clustering and classification.

Data Min Knowl Discov

School of Mathematical and Physical Sciences, University of Newcastle, University Drive, Callaghan, NSW 2308 Australia.

Published: July 2020

Semi-supervised learning is drawing increasing attention in the era of big data, as the gap between the abundance of cheap, automatically collected unlabeled data and the scarcity of labeled data that are laborious and expensive to obtain is dramatically increasing. In this paper, we first introduce a unified view of density-based clustering algorithms. We then build upon this view and bridge the areas of semi-supervised clustering and classification under a common umbrella of density-based techniques. We show that there are close relations between density-based clustering algorithms and the graph-based approach for transductive classification. These relations are then used as a basis for a new framework for semi-supervised classification based on building-blocks from density-based clustering. This framework is not only efficient and effective, but it is also statistically sound. In addition, we generalize the core algorithm in our framework, HDBSCAN*, so that it can also perform semi-supervised clustering by directly taking advantage of any fraction of labeled data that may be available. Experimental results on a large collection of datasets show the advantages of the proposed approach both for semi-supervised classification as well as for semi-supervised clustering.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7410108PMC
http://dx.doi.org/10.1007/s10618-019-00651-1DOI Listing

Publication Analysis

Top Keywords

semi-supervised clustering
16
density-based clustering
12
unified view
8
view density-based
8
clustering classification
8
labeled data
8
clustering algorithms
8
semi-supervised classification
8
semi-supervised
7
clustering
7

Similar Publications

Resolving heterogeneity of early-onset major depressive disorder through individual differential structural covariance network analysis.

J Affect Disord

January 2025

Department of Child and Adolescent Psychiatry, Affiliated Brain Hospital, Guangzhou Medical University, Guangzhou, China; The First School of Clinical Medicine, Southern Medical University, Guangzhou, China; Key Laboratory of Neurogenetics and Channelopathies of Guangdong Province and the Ministry of Education of China, Guangzhou Medical University, Guangzhou, China; Guangdong Engineering Technology Research Center for Translational Medicine of Mental Disorders, Guangzhou, China. Electronic address:

Background: Early-onset major depressive disorder (EO-MDD) is characterized by its significant heterogeneity, hindering progress in research. Traditional case-control studies, like group-level structural covariance network, struggle to capture individual heterogeneity among EO-MDD patients.

Methods: In this study, T1-weighted structural magnetic resonance imaging was obtained from 185 participants, including 103 EO-MDD patients and 82 healthy controls.

View Article and Find Full Text PDF

In credit risk assessment, unsupervised classification techniques can be introduced to reduce human resource expenses and expedite decision-making. Despite the efficacy of unsupervised learning methods in handling unlabeled datasets, their performance remains limited owing to challenges such as imbalanced data, local optima, and parameter adjustment complexities. Thus, this paper introduces a novel hybrid unsupervised classification method, named the two-stage hybrid system with spectral clustering and semi-supervised support vector machine (TSC-SVM), which effectively addresses the unsupervised imbalance problem in credit risk assessment by targeting global optimal solutions.

View Article and Find Full Text PDF

Existing emotion-driven music generation models heavily rely on labeled data and lack interpretability and controllability of emotions. To address these limitations, a semi-supervised emotion-driven music generation model based on category-dispersed Gaussian mixture variational autoencoders is proposed. Initially, a controllable music generation model is introduced, which disentangles and manipulates rhythm and tonal features, enabling controlled music generation.

View Article and Find Full Text PDF

Rapid Eye Movement (REM) sleep behavior disorder (RBD) affects nearly half of Parkinson's disease (PD) patients. However, the structural heterogeneity within the brainstem, which regulates REM sleep, remains largely unexplored in PD. Our objective was to identify distinct PD subtypes based on microstructural characteristics in the brainstem and examine their associations with the severity of RBD.

View Article and Find Full Text PDF

Fatigue plays a critical role in sports science, significantly affecting recovery, training effectiveness, and overall athletic performance. Understanding and predicting fatigue is essential to optimize training, prevent overtraining, and minimize the risk of injuries. The aim of this study is to leverage Human Activity Recognition (HAR) through deep learning methods for dimensionality reduction.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!