Semi-supervised learning (SSL) suffers from severe performance degradation when labeled and unlabeled data come from inconsistent and imbalanced distribution. Nonetheless, there is a lack of theoretical guidance regarding a remedy for this issue. To bridge the gap between theoretical insights and practical solutions, we embark to an analysis of generalization bound of classic SSL algorithms. This analysis reveals that distribution inconsistency between unlabeled and labeled data can cause a significant generalization error bound. Motivated by this theoretical insight, we present a Triplet Adaptation Framework (TAF) to reduce the distribution divergence and improve the generalization of SSL models. TAF comprises three adapters: Balanced Residual Adapter, aiming to map the class distribution of labeled and unlabeled data to a uniform distribution for reducing class distribution divergence; Representation Adapter, aiming to map the representation distribution of unlabeled data to labeled one for reducing representation distribution divergence; and Pseudo-Label Adapter, aiming to align the predicted pseudo-labels with the class distribution of unlabeled data, thereby preventing erroneous pseudo-labels from exacerbating representation divergence. These three adapters collaborate synergistically to reduce the generalization bound, ultimately achieving a more robust and generalizable SSL model. Extensive experiments across various robust SSL scenarios validate the efficacy of our method.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TPAMI.2024.3404450 | DOI Listing |
Med Biol Eng Comput
January 2025
School of Software, Jiangxi Normal University, Nanchang, 330022, China.
Source-free domain adaptation (SFDA) has become crucial in medical image analysis, enabling the adaptation of source models across diverse datasets without labeled target domain images. Self-training, a popular SFDA approach, iteratively refines self-generated pseudo-labels using unlabeled target domain data to adapt a pre-trained model from the source domain. However, it often faces model instability due to incorrect pseudo-label accumulation and foreground-background class imbalance.
View Article and Find Full Text PDFSelf-supervised learning (SSL) is an approach to pretrain models with unlabeled datasets and extract useful feature representations such that these models can be easily fine-tuned for various downstream tasks. Self-pretraining applies SSL on curated task-specific datasets without using task-specific labels. Increasing availability of public data repositories has now made it possible to utilize diverse and large, task unrelated datasets to pretrain models in the "wild" using SSL.
View Article and Find Full Text PDFBrief Bioinform
November 2024
Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
Cryo-electron tomography (cryo-ET) is confronted with the intricate task of unveiling novel structures. General class discovery (GCD) seeks to identify new classes by learning a model that can pseudo-label unannotated (novel) instances solely using supervision from labeled (base) classes. While 2D GCD for image data has made strides, its 3D counterpart remains unexplored.
View Article and Find Full Text PDFJ Hazard Mater
January 2025
State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, China. Electronic address:
Ecotoxicity assessments, which rely on animal testing, face serious challenges, including high costs and ethical concerns. Computational toxicology presents a promising alternative; nevertheless, existing predictive models encounter difficulties such as limited datasets and pronounced overfitting. To address these issues, we propose a framework for predicting pesticide ecotoxicity using graph contrastive learning (PE-GCL).
View Article and Find Full Text PDFNat Commun
January 2025
National-Local Joint Engineering Laboratory of Druggability and New Drug Evaluation, National Engineering Research Center for New Drug and Druggability (cultivation), Guangdong Province Key Laboratory of New Drug Design and Evaluation, School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou, 510006, China.
Epitranscriptomic modifications, particularly N6-methyladenosine (mA), are crucial regulators of gene expression, influencing processes such as RNA stability, splicing, and translation. Traditional computational methods for detecting mA from Nanopore direct RNA sequencing (DRS) data are constrained by their reliance on experimentally validated labels, often resulting in the underestimation of modification sites. Here, we introduce pum6a, an innovative attention-based framework that integrates positive and unlabeled multi-instance learning (MIL) to address the challenges of incomplete labeling and missing read-level annotations.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!