Toward Robust Self-Training Paradigm for Molecular Prediction Tasks.

Hehuan Ma Feng Jiang Yu Rong Yuzhi Guo Junzhou Huang

J Comput Biol

Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA.

Published: March 2024

Molecular prediction tasks normally demand a series of professional experiments to label the target molecule, which suffers from the limited labeled data problem. One of the semisupervised learning paradigms, known as self-training, utilizes both labeled and unlabeled data. Specifically, a teacher model is trained using labeled data and produces pseudo labels for unlabeled data. These labeled and pseudo-labeled data are then jointly used to train a student model. However, the pseudo labels generated from the teacher model are generally not sufficiently accurate. Thus, we propose a robust self-training strategy by exploring robust loss function to handle such noisy labels in two paradigms, that is, generic and adaptive. We have conducted experiments on three molecular biology prediction tasks with four backbone models to gradually evaluate the performance of the proposed robust self-training strategy. The results demonstrate that the proposed method enhances prediction performance across all tasks, notably within molecular regression tasks, where there has been an average enhancement of 41.5%. Furthermore, the visualization analysis confirms the superiority of our method. Our proposed robust self-training is a simple yet effective strategy that efficiently improves molecular biology prediction performance. It tackles the labeled data insufficient issue in molecular biology by taking advantage of both labeled and unlabeled data. Moreover, it can be easily embedded with any prediction task, which serves as a universal approach for the bioinformatics community.

Download full-text PDF	Source
http://dx.doi.org/10.1089/cmb.2023.0187	DOI Listing

Publication Analysis

Top Keywords

robust self-training

prediction tasks

labeled data

unlabeled data

molecular biology

molecular prediction

labeled unlabeled

teacher model

pseudo labels

self-training strategy

Similar Publications

Unsupervised Domain Adaptation With Synchronized Self-Training for Cross-Domain Motor Imagery Recognition.

IEEE J Biomed Health Inform

January 2025

Peiyin Chen Xiaofeng Liu Chao Ma He Wang Xiong Yang

Robust decoding performance is essential for the practical deployment of brain-computer interface (BCI) systems. Existing EEG decoding models often rely on large amounts of annotated data collected through specific experimental setups, which fail to address the heterogeneity of data distributions across different domains. This limitation hinders BCI systems from effectively managing the complexity and variability of real-world data.

View Article and Find Full Text PDF

Similar Publications

Robust multi-label surgical tool classification in noisy endoscopic videos.

Sci Rep

February 2025

University of the West of England, Bristol, UK.

Adnan Qayyum Hassan Ali Massimo Caputo Hunaid Vohra Taofeek Akinosho

Over the past few years, surgical data science has attracted substantial interest from the machine learning (ML) community. Various studies have demonstrated the efficacy of emerging ML techniques in analysing surgical data, particularly recordings of procedures, for digitising clinical and non-clinical functions like preoperative planning, context-aware decision-making, and operating skill assessment. However, this field is still in its infancy and lacks representative, well-annotated datasets for training robust models in intermediate ML tasks.

View Article and Find Full Text PDF

Similar Publications

Efficient diagnosis of retinal disorders using dual-branch semi-supervised learning (DB-SSL): An enhanced multi-class classification approach.

Comput Med Imaging Graph

April 2025

School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China; Key Laboratory of Knowledge Automation for Industrial Processes, Ministry of Education, University of Science and Technology Beijing, Beijing 100083, China. Electronic address:

Muhammad Hammad Malik Zishuo Wan Yu Gao Da-Wei Ding

The early diagnosis of retinal disorders is essential in preventing permanent or partial blindness. Identifying these conditions promptly guarantees early treatment and prevents blindness. However, the challenge lies in accurately diagnosing these conditions, especially with limited labeled data.

View Article and Find Full Text PDF

Similar Publications

Energy-Based Domain Adaptation Without Intermediate Domain Dataset for Foggy Scene Segmentation.

IEEE Trans Image Process

October 2024

Donggon Jang Sunhyeok Lee Gyuwon Choi Yejin Lee Sanghyeok Son

Robust segmentation performance under dense fog is crucial for autonomous driving, but collecting labeled real foggy scene datasets is burdensome in the real world. To this end, existing methods have adapted models trained on labeled clear weather images to the unlabeled real foggy domain. However, these approaches require intermediate domain datasets (e.

View Article and Find Full Text PDF

Similar Publications

A self-training interpretable cell type annotation framework using specific marker gene.

Bioinformatics

October 2024

School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China.

Hegang Chen Yuyin Lu Yanghui Rao

Motivation: Recent advances in sequencing technology provide opportunities to study biological processes at a higher resolution. Cell type annotation is an important step in scRNA-seq analysis, which often relies on established marker genes. However, most of the previous methods divide the identification of cell types into two stages, clustering and assignment, whose performances are susceptible to the clustering algorithm, and the marker information cannot effectively guide the clustering process.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!