Molecular prediction tasks normally demand a series of professional experiments to label the target molecule, which suffers from the limited labeled data problem. One of the semisupervised learning paradigms, known as self-training, utilizes both labeled and unlabeled data. Specifically, a teacher model is trained using labeled data and produces pseudo labels for unlabeled data. These labeled and pseudo-labeled data are then jointly used to train a student model. However, the pseudo labels generated from the teacher model are generally not sufficiently accurate. Thus, we propose a robust self-training strategy by exploring robust loss function to handle such noisy labels in two paradigms, that is, generic and adaptive. We have conducted experiments on three molecular biology prediction tasks with four backbone models to gradually evaluate the performance of the proposed robust self-training strategy. The results demonstrate that the proposed method enhances prediction performance across all tasks, notably within molecular regression tasks, where there has been an average enhancement of 41.5%. Furthermore, the visualization analysis confirms the superiority of our method. Our proposed robust self-training is a simple yet effective strategy that efficiently improves molecular biology prediction performance. It tackles the labeled data insufficient issue in molecular biology by taking advantage of both labeled and unlabeled data. Moreover, it can be easily embedded with any prediction task, which serves as a universal approach for the bioinformatics community.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1089/cmb.2023.0187 | DOI Listing |
IEEE J Biomed Health Inform
January 2025
Robust decoding performance is essential for the practical deployment of brain-computer interface (BCI) systems. Existing EEG decoding models often rely on large amounts of annotated data collected through specific experimental setups, which fail to address the heterogeneity of data distributions across different domains. This limitation hinders BCI systems from effectively managing the complexity and variability of real-world data.
View Article and Find Full Text PDFOver the past few years, surgical data science has attracted substantial interest from the machine learning (ML) community. Various studies have demonstrated the efficacy of emerging ML techniques in analysing surgical data, particularly recordings of procedures, for digitising clinical and non-clinical functions like preoperative planning, context-aware decision-making, and operating skill assessment. However, this field is still in its infancy and lacks representative, well-annotated datasets for training robust models in intermediate ML tasks.
View Article and Find Full Text PDFComput Med Imaging Graph
April 2025
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China; Key Laboratory of Knowledge Automation for Industrial Processes, Ministry of Education, University of Science and Technology Beijing, Beijing 100083, China. Electronic address:
The early diagnosis of retinal disorders is essential in preventing permanent or partial blindness. Identifying these conditions promptly guarantees early treatment and prevents blindness. However, the challenge lies in accurately diagnosing these conditions, especially with limited labeled data.
View Article and Find Full Text PDFIEEE Trans Image Process
October 2024
Robust segmentation performance under dense fog is crucial for autonomous driving, but collecting labeled real foggy scene datasets is burdensome in the real world. To this end, existing methods have adapted models trained on labeled clear weather images to the unlabeled real foggy domain. However, these approaches require intermediate domain datasets (e.
View Article and Find Full Text PDFBioinformatics
October 2024
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China.
Motivation: Recent advances in sequencing technology provide opportunities to study biological processes at a higher resolution. Cell type annotation is an important step in scRNA-seq analysis, which often relies on established marker genes. However, most of the previous methods divide the identification of cell types into two stages, clustering and assignment, whose performances are susceptible to the clustering algorithm, and the marker information cannot effectively guide the clustering process.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!