A self-training algorithm is a classical semi-supervised learning algorithm that uses a small number of labeled samples and a large number of unlabeled samples to train a classifier. However, the existing self-training algorithms consider only the geometric distance between data while ignoring the data distribution when calculating the similarity between samples. In addition, misclassified samples can severely affect the performance of a self-training algorithm. To address the above two problems, this paper proposes a self-training algorithm based on data editing with mass-based dissimilarity (STDEMB). First, the mass matrix with the mass-based dissimilarity is obtained, and then the mass-based local density of each sample is determined based on its k nearest neighbors. Inspired by density peak clustering (DPC), this study designs a prototype tree based on the prototype concept. In addition, an efficient two-stage data editing algorithm is developed to edit misclassified samples and efficiently select high-confidence samples during the self-training process. The proposed STDEMB algorithm is verified by experiments using accuracy and F-score as evaluation metrics. The experimental results on 18 benchmark datasets demonstrate the effectiveness of the proposed STDEMB algorithm.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.neunet.2023.09.046 | DOI Listing |
Neural Comput
January 2025
Department of Advanced Data Science, Institute of Statistical Mathematics, Tachikawa, Tokyo 190-8562, Japan
Standard domain adaptation methods do not work well when a large gap exists between the source and target domains. Gradual domain adaptation is one of the approaches used to address the problem. It involves leveraging the intermediate domain, which gradually shifts from the source domain to the target domain.
View Article and Find Full Text PDFIn unsupervised transfer learning for medical image segmentation, where existing algorithms face the challenge of error propagation due to inaccessible source domain data. In response to this scenario, source-free domain transfer algorithm with reduced style sensitivity (SFDT-RSS) is designed. SFDT-RSS initially pre-trains the source domain model by using the generalization strategy and subsequently adapts the pre-trained model to target domain without accessing source data.
View Article and Find Full Text PDFPLoS One
December 2024
School of Cyber Science and Engineering, Sichuan University, Chengdu, China.
The task of named entity recognition (NER) plays a crucial role in extracting cybersecurity-related information. Existing approaches for cybersecurity entity extraction predominantly rely on manual labelling data, resulting in labour-intensive processes due to the lack of a cybersecurity-specific corpus. In this paper, we propose an improved self-training-based distant label denoising method for cybersecurity entity extraction.
View Article and Find Full Text PDFJ Imaging Inform Med
December 2024
Zhuhai Hengqin Sanmed Aitech Inc,, Zhuhai, Guangdong, China.
Circulating genetically abnormal cells (CACs) serve as crucial biomarkers for lung cancer diagnosis. Detecting CACs holds great value for early diagnosis and screening of lung cancer. To aid the identification of CACs, we have incorporated deep learning algorithms into our CACs detection system, specifically developing algorithms for cell segmentation and signal point detection.
View Article and Find Full Text PDFPhysiol Meas
January 2025
College of Electrical Engineering and Automation, Fuzhou University, Fuzhou 350108, Fujian, People's Republic of China.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!