Class labels are required for supervised learning but may be corrupted or missing in various applications. In binary classification, for example, when only a subset of positive instances is labeled whereas the remaining are unlabeled, positive-unlabeled (PU) learning is required to model from both positive and unlabeled data. Similarly, when class labels are corrupted by mislabeled instances, methods are needed for learning in the presence of class label noise (LN). Here we propose adaptive sampling (AdaSampling), a framework for both PU learning and learning with class LN. By iteratively estimating the class mislabeling probability with an adaptive sampling procedure, the proposed method progressively reduces the risk of selecting mislabeled instances for model training and subsequently constructs highly generalizable models even when a large proportion of mislabeled instances is present in the data. We demonstrate the utilities of proposed methods using simulation and benchmark data, and compare them to alternative approaches that are commonly used for PU learning and/or learning with LN. We then introduce two novel bioinformatics applications where AdaSampling is used to: 1) identify kinase-substrates from mass spectrometry-based phosphoproteomics data and 2) predict transcription factor target genes by integrating various next-generation sequencing data.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TCYB.2018.2816984 | DOI Listing |
Food Sci Anim Resour
November 2024
Research Center for Food Technology and Processing (PRTPP), National Research and Innovation Agency (BRIN), Yogyakarta 55861, Indonesia.
Religious beliefs have a significant impact on consumer preferences, particularly in relation to food choices. Islam, like other religions, imposes specific dietary guidelines, notably regarding meat and meat products. However, ensuring compliance with halal standards across the entire meat and meat products supply chain presents considerable challenges.
View Article and Find Full Text PDFData-free knowledge distillation aims to learn a small student network from a large pre-trained teacher network without the aid of original training data. Recent works propose to gather alternative data from the Internet for training student network. In a more realistic scenario, the data on the Internet contains two types of label noise, namely: 1) closed-set label noise, where some examples belong to the known categories but are mislabeled; and 2) open-set label noise, where the true labels of some mislabeled examples are outside the known categories.
View Article and Find Full Text PDFCureus
March 2024
Department of Pathology, Saveetha Medical College and Hospital, Saveetha Institute of Medical Sciences, Saveetha University, Chennai, IND.
Dedifferentiated endometrioid carcinoma (DEC) is an exceptionally rare subtype of endometrial cancer characterized by a high-grade component juxtaposed with a low-grade endometrioid adenocarcinoma. This case report presents a unique instance of dedifferentiated endometrioid carcinoma in a 64-year-old female patient who presented with post-menopausal bleeding and abdominal pain. Diagnostic evaluation including imaging studies and histopathological examination revealed a mixed tumor comprising both high-grade and low-grade components.
View Article and Find Full Text PDFComput Biol Med
March 2024
Harbin Institute of Technology, School of Computer Science and Technology, Harbin, 150001, China. Electronic address:
Instance segmentation plays an important role in the automatic diagnosis of cervical cancer. Although deep learning-based instance segmentation methods can achieve outstanding performance, they need large amounts of labeled data. This results in a huge consumption of manpower and material resources.
View Article and Find Full Text PDFFront Plant Sci
January 2024
Subtropical Horticulture Research Station, United States Department of Agriculture, Agriculture Research Service, Miami, FL, United States.
The World Collection of Sugarcane and Related Grasses, maintained at the USDA-ARS in Miami, FL, is one of the largest sugarcane germplasm repositories in the world. However, the genetic integrity of the spp. germplasm in this collection has not been fully analyzed.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!