Toward Robust Self-Training Paradigm for Molecular Prediction Tasks.

J Comput Biol

Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA.

Published: March 2024

Molecular prediction tasks normally demand a series of professional experiments to label the target molecule, which suffers from the limited labeled data problem. One of the semisupervised learning paradigms, known as self-training, utilizes both labeled and unlabeled data. Specifically, a teacher model is trained using labeled data and produces pseudo labels for unlabeled data. These labeled and pseudo-labeled data are then jointly used to train a student model. However, the pseudo labels generated from the teacher model are generally not sufficiently accurate. Thus, we propose a robust self-training strategy by exploring robust loss function to handle such noisy labels in two paradigms, that is, generic and adaptive. We have conducted experiments on three molecular biology prediction tasks with four backbone models to gradually evaluate the performance of the proposed robust self-training strategy. The results demonstrate that the proposed method enhances prediction performance across all tasks, notably within molecular regression tasks, where there has been an average enhancement of 41.5%. Furthermore, the visualization analysis confirms the superiority of our method. Our proposed robust self-training is a simple yet effective strategy that efficiently improves molecular biology prediction performance. It tackles the labeled data insufficient issue in molecular biology by taking advantage of both labeled and unlabeled data. Moreover, it can be easily embedded with any prediction task, which serves as a universal approach for the bioinformatics community.

Download full-text PDF

Source
http://dx.doi.org/10.1089/cmb.2023.0187DOI Listing

Publication Analysis

Top Keywords

robust self-training
16
prediction tasks
12
labeled data
12
unlabeled data
12
molecular biology
12
molecular prediction
8
labeled unlabeled
8
teacher model
8
pseudo labels
8
self-training strategy
8

Similar Publications

Robust decoding performance is essential for the practical deployment of brain-computer interface (BCI) systems. Existing EEG decoding models often rely on large amounts of annotated data collected through specific experimental setups, which fail to address the heterogeneity of data distributions across different domains. This limitation hinders BCI systems from effectively managing the complexity and variability of real-world data.

View Article and Find Full Text PDF

Over the past few years, surgical data science has attracted substantial interest from the machine learning (ML) community. Various studies have demonstrated the efficacy of emerging ML techniques in analysing surgical data, particularly recordings of procedures, for digitising clinical and non-clinical functions like preoperative planning, context-aware decision-making, and operating skill assessment. However, this field is still in its infancy and lacks representative, well-annotated datasets for training robust models in intermediate ML tasks.

View Article and Find Full Text PDF

Efficient diagnosis of retinal disorders using dual-branch semi-supervised learning (DB-SSL): An enhanced multi-class classification approach.

Comput Med Imaging Graph

April 2025

School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China; Key Laboratory of Knowledge Automation for Industrial Processes, Ministry of Education, University of Science and Technology Beijing, Beijing 100083, China. Electronic address:

The early diagnosis of retinal disorders is essential in preventing permanent or partial blindness. Identifying these conditions promptly guarantees early treatment and prevents blindness. However, the challenge lies in accurately diagnosing these conditions, especially with limited labeled data.

View Article and Find Full Text PDF

Robust segmentation performance under dense fog is crucial for autonomous driving, but collecting labeled real foggy scene datasets is burdensome in the real world. To this end, existing methods have adapted models trained on labeled clear weather images to the unlabeled real foggy domain. However, these approaches require intermediate domain datasets (e.

View Article and Find Full Text PDF

A self-training interpretable cell type annotation framework using specific marker gene.

Bioinformatics

October 2024

School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China.

Motivation: Recent advances in sequencing technology provide opportunities to study biological processes at a higher resolution. Cell type annotation is an important step in scRNA-seq analysis, which often relies on established marker genes. However, most of the previous methods divide the identification of cell types into two stages, clustering and assignment, whose performances are susceptible to the clustering algorithm, and the marker information cannot effectively guide the clustering process.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!