Active Clustering Ensemble With Self-Paced Learning.

IEEE Trans Neural Netw Learn Syst

Published: September 2024

AI Article Synopsis

  • A clustering ensemble combines multiple clustering results to improve accuracy, but traditional methods can struggle with unreliable data due to the lack of labels.
  • We propose a new approach called self-paced active clustering ensemble (SPACE), which actively selects uncertain data for labeling during the clustering process.
  • By integrating active learning with a self-paced approach, SPACE enhances clustering performance by addressing unreliable instances and optimizing the selection of both challenging and easier data for better ensemble outcomes.

Article Abstract

A clustering ensemble provides an elegant framework to learn a consensus result from multiple prespecified clustering partitions. Though conventional clustering ensemble methods achieve promising performance in various applications, we observe that they may usually be misled by some unreliable instances due to the absence of labels. To tackle this issue, we propose a novel active clustering ensemble method, which selects the uncertain or unreliable data for querying the annotations in the process of the ensemble. To fulfill this idea, we seamlessly integrate the active clustering ensemble method into a self-paced learning framework, leading to a novel self-paced active clustering ensemble (SPACE) method. The proposed SPACE can jointly select unreliable data to label via automatically evaluating their difficulty and applying easy data to ensemble the clusterings. In this way, these two tasks can be boosted by each other, with the aim to achieve better clustering performance. The experimental results on benchmark datasets demonstrate the significant effectiveness of our method. The codes of this article are released in https://Doctor-Nobody.github.io/codes/space.zip.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2023.3252586DOI Listing

Publication Analysis

Top Keywords

clustering ensemble
24
active clustering
16
ensemble
8
self-paced learning
8
ensemble method
8
unreliable data
8
clustering
7
active
4
ensemble self-paced
4
learning clustering
4

Similar Publications

Spatial transcriptomics data analysis integrates gene expression profiles with their corresponding spatial locations to identify spatial domains, infer cell-type dynamics, and detect gene expression patterns within tissues. However, the current spatial transcriptomics analysis neglects the multiscale cell-cell interactions that are crucial in biology. To fill this gap, we propose multiscale cell-cell interactive spatial transcriptomics (MCIST) analysis.

View Article and Find Full Text PDF

Groundwater toxicity and water level depletion are serious concerns today. Assessing groundwater quality (GWQ) is crucial for effective planning and management due to increasing demands for drinking and irrigation water. Therefore, this study aims to analyze groundwater hydrochemistry, variability, and factors influencing quality for drinking and irrigation purposes using indices and models.

View Article and Find Full Text PDF

ModeHunter is a modular Python software package for the simulation of 3D biophysical motion across spatial resolution scales using modal analysis of elastic networks. It has been curated from our in-house Python scripts over the last 15 years, with a focus on detecting similarities of elastic motion between atomic structures, coarse-grained graphs, and volumetric data obtained from biophysical or biomedical imaging origins, such as electron microscopy or tomography. With ModeHunter, normal modes of biophysical motion can be analyzed with various static visualization techniques or brought to life by dynamics animation in terms of single or multimode trajectories or decoy ensembles.

View Article and Find Full Text PDF

Our recently developed approach based on the local coupled-cluster with single, double, and perturbative triple excitation [LCCSD(T)] model gives very efficient means to compute the ideal-gas enthalpies of formation. The expanded uncertainty (95% confidence) of the method is about 3 kJ·mol for medium-sized compounds, comparable to typical experimental measurements. Larger compounds of interest often exhibit many conformations that can significantly differ in intramolecular interactions.

View Article and Find Full Text PDF

Pathology provides the definitive diagnosis, and Artificial Intelligence (AI) tools are poised to improve accuracy, inter-rater agreement, and turn-around time (TAT) of pathologists, leading to improved quality of care. A high value clinical application is the grading of Lymph Node Metastasis (LNM) which is used for breast cancer staging and guides treatment decisions. A challenge of implementing AI tools widely for LNM classification is domain shift, where Out-of-Distribution (OOD) data has a different distribution than the In-Distribution (ID) data used to train the model, resulting in a drop in performance in OOD data.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!