A self-training algorithm based on the two-stage data editing method with mass-based dissimilarity.

Neural Netw

School of Computer Science and Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi'an 710072, Shanxi, China. Electronic address:

Published: November 2023

A self-training algorithm is a classical semi-supervised learning algorithm that uses a small number of labeled samples and a large number of unlabeled samples to train a classifier. However, the existing self-training algorithms consider only the geometric distance between data while ignoring the data distribution when calculating the similarity between samples. In addition, misclassified samples can severely affect the performance of a self-training algorithm. To address the above two problems, this paper proposes a self-training algorithm based on data editing with mass-based dissimilarity (STDEMB). First, the mass matrix with the mass-based dissimilarity is obtained, and then the mass-based local density of each sample is determined based on its k nearest neighbors. Inspired by density peak clustering (DPC), this study designs a prototype tree based on the prototype concept. In addition, an efficient two-stage data editing algorithm is developed to edit misclassified samples and efficiently select high-confidence samples during the self-training process. The proposed STDEMB algorithm is verified by experiments using accuracy and F-score as evaluation metrics. The experimental results on 18 benchmark datasets demonstrate the effectiveness of the proposed STDEMB algorithm.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.neunet.2023.09.046DOI Listing

Publication Analysis

Top Keywords

self-training algorithm
16
data editing
12
mass-based dissimilarity
12
algorithm based
8
two-stage data
8
misclassified samples
8
proposed stdemb
8
stdemb algorithm
8
algorithm
7
self-training
6

Similar Publications

Gradual Domain Adaptation via Normalizing Flows.

Neural Comput

January 2025

Department of Advanced Data Science, Institute of Statistical Mathematics, Tachikawa, Tokyo 190-8562, Japan

Standard domain adaptation methods do not work well when a large gap exists between the source and target domains. Gradual domain adaptation is one of the approaches used to address the problem. It involves leveraging the intermediate domain, which gradually shifts from the source domain to the target domain.

View Article and Find Full Text PDF

In unsupervised transfer learning for medical image segmentation, where existing algorithms face the challenge of error propagation due to inaccessible source domain data. In response to this scenario, source-free domain transfer algorithm with reduced style sensitivity (SFDT-RSS) is designed. SFDT-RSS initially pre-trains the source domain model by using the generalization strategy and subsequently adapts the pre-trained model to target domain without accessing source data.

View Article and Find Full Text PDF

The task of named entity recognition (NER) plays a crucial role in extracting cybersecurity-related information. Existing approaches for cybersecurity entity extraction predominantly rely on manual labelling data, resulting in labour-intensive processes due to the lack of a cybersecurity-specific corpus. In this paper, we propose an improved self-training-based distant label denoising method for cybersecurity entity extraction.

View Article and Find Full Text PDF

Circulating genetically abnormal cells (CACs) serve as crucial biomarkers for lung cancer diagnosis. Detecting CACs holds great value for early diagnosis and screening of lung cancer. To aid the identification of CACs, we have incorporated deep learning algorithms into our CACs detection system, specifically developing algorithms for cell segmentation and signal point detection.

View Article and Find Full Text PDF

An adaptive learning method for long-term gesture recognition based on surface electromyography.

Physiol Meas

January 2025

College of Electrical Engineering and Automation, Fuzhou University, Fuzhou 350108, Fujian, People's Republic of China.

Article Synopsis
  • * To overcome these challenges, the authors propose a new method that involves optimizing feature extraction, dimensionality reduction, and model calibration to recognize EMG signals effectively over time.
  • * Their method, tested over 30 days of data, shows over 90% accuracy in gesture recognition using only one set of unlabeled samples, demonstrating both high performance and convenience for daily use.
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!