Class labels are required for supervised learning but may be corrupted or missing in various applications. In binary classification, for example, when only a subset of positive instances is labeled whereas the remaining are unlabeled, positive-unlabeled (PU) learning is required to model from both positive and unlabeled data. Similarly, when class labels are corrupted by mislabeled instances, methods are needed for learning in the presence of class label noise (LN). Here we propose adaptive sampling (AdaSampling), a framework for both PU learning and learning with class LN. By iteratively estimating the class mislabeling probability with an adaptive sampling procedure, the proposed method progressively reduces the risk of selecting mislabeled instances for model training and subsequently constructs highly generalizable models even when a large proportion of mislabeled instances is present in the data. We demonstrate the utilities of proposed methods using simulation and benchmark data, and compare them to alternative approaches that are commonly used for PU learning and/or learning with LN. We then introduce two novel bioinformatics applications where AdaSampling is used to: 1) identify kinase-substrates from mass spectrometry-based phosphoproteomics data and 2) predict transcription factor target genes by integrating various next-generation sequencing data.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCYB.2018.2816984DOI Listing

Publication Analysis

Top Keywords

mislabeled instances
12
label noise
8
learning
8
bioinformatics applications
8
class labels
8
adaptive sampling
8
class
5
data
5
adasampling positive-unlabeled
4
positive-unlabeled label
4

Similar Publications

Religious beliefs have a significant impact on consumer preferences, particularly in relation to food choices. Islam, like other religions, imposes specific dietary guidelines, notably regarding meat and meat products. However, ensuring compliance with halal standards across the entire meat and meat products supply chain presents considerable challenges.

View Article and Find Full Text PDF

Data-free knowledge distillation aims to learn a small student network from a large pre-trained teacher network without the aid of original training data. Recent works propose to gather alternative data from the Internet for training student network. In a more realistic scenario, the data on the Internet contains two types of label noise, namely: 1) closed-set label noise, where some examples belong to the known categories but are mislabeled; and 2) open-set label noise, where the true labels of some mislabeled examples are outside the known categories.

View Article and Find Full Text PDF

A Rare Case Report of Dedifferentiated Endometrioid Carcinoma.

Cureus

March 2024

Department of Pathology, Saveetha Medical College and Hospital, Saveetha Institute of Medical Sciences, Saveetha University, Chennai, IND.

Dedifferentiated endometrioid carcinoma (DEC) is an exceptionally rare subtype of endometrial cancer characterized by a high-grade component juxtaposed with a low-grade endometrioid adenocarcinoma. This case report presents a unique instance of dedifferentiated endometrioid carcinoma in a 64-year-old female patient who presented with post-menopausal bleeding and abdominal pain. Diagnostic evaluation including imaging studies and histopathological examination revealed a mixed tumor comprising both high-grade and low-grade components.

View Article and Find Full Text PDF

Instance segmentation plays an important role in the automatic diagnosis of cervical cancer. Although deep learning-based instance segmentation methods can achieve outstanding performance, they need large amounts of labeled data. This results in a huge consumption of manpower and material resources.

View Article and Find Full Text PDF

Assessing the genetic integrity of sugarcane germplasm in the USDA-ARS National Plant Germplasm System collection using single-dose SNP markers.

Front Plant Sci

January 2024

Subtropical Horticulture Research Station, United States Department of Agriculture, Agriculture Research Service, Miami, FL, United States.

The World Collection of Sugarcane and Related Grasses, maintained at the USDA-ARS in Miami, FL, is one of the largest sugarcane germplasm repositories in the world. However, the genetic integrity of the spp. germplasm in this collection has not been fully analyzed.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!