Theory and Algorithms for Shapelet-Based Multiple-Instance Learning.

Neural Comput

Department of Creative Informatics, University of Tokyo, and RIKEN Center for Advanced Intelligence Project, Bunkyo-ku, Tokyo, 1138656, Japan

Published: August 2020

We propose a new formulation of multiple-instance learning (MIL), in which a unit of data consists of a set of instances called a bag. The goal is to find a good classifier of bags based on the similarity with a "shapelet" (or pattern), where the similarity of a bag with a shapelet is the maximum similarity of instances in the bag. In previous work, some of the training instances have been chosen as shapelets with no theoretical justification. In our formulation, we use all possible, and thus infinitely many, shapelets, resulting in a richer class of classifiers. We show that the formulation is tractable, that is, it can be reduced through linear programming boosting (LPBoost) to difference of convex (DC) programs of finite (actually polynomial) size. Our theoretical result also gives justification to the heuristics of some previous work. The time complexity of the proposed algorithm highly depends on the size of the set of all instances in the training sample. To apply to the data containing a large number of instances, we also propose a heuristic option of the algorithm without the loss of the theoretical guarantee. Our empirical study demonstrates that our algorithm uniformly works for shapelet learning tasks on time-series classification and various MIL tasks with comparable accuracy to the existing methods. Moreover, we show that the proposed heuristics allow us to achieve the result in reasonable computational time.

Download full-text PDF

Source
http://dx.doi.org/10.1162/neco_a_01297DOI Listing

Publication Analysis

Top Keywords

multiple-instance learning
8
set instances
8
previous work
8
instances
5
theory algorithms
4
algorithms shapelet-based
4
shapelet-based multiple-instance
4
learning propose
4
propose formulation
4
formulation multiple-instance
4

Similar Publications

Purpose: Differentiating primary central nervous system lymphoma (PCNSL) and glioblastoma (GBM) is crucial because their prognosis and treatment differ substantially. Manual examination of their histological characteristics is considered the golden standard in clinical diagnosis. However, this process is tedious and time-consuming and might lead to misdiagnosis caused by morphological similarity between their histology and tumor heterogeneity.

View Article and Find Full Text PDF

KMeansGraphMIL: A Weakly Supervised Multiple Instance Learning Model for Predicting Colorectal Cancer Tumor Mutational Burden.

Am J Pathol

January 2025

The Seventh Affiliated Hospital, Sun Yat-Sen University, 628 Zhenyuan Road, Xinhu Street, Guangming New District, Shenzhen, 518107, Guangdong, China. Electronic address:

Colorectal cancer (CRC) is one of the top three most lethal malignancies worldwide, posing a significant threat to human health. Recently proposed immunotherapy checkpoint blockade treatments have proven effective for CRC, but their use depends on measuring specific biomarkers in patients. Among these biomarkers, Tumor Mutational Burden (TMB) has emerged as a novel indicator, traditionally requiring Next-Generation Sequencing (NGS) for measurement, which is time-consuming, labor-intensive, and costly.

View Article and Find Full Text PDF

Recently, as the number of cancer patients has increased, much research is being conducted for efficient treatment, including the use of artificial intelligence in genitourinary pathology. Recent research has focused largely on the classification of renal cell carcinoma subtypes. Nonetheless, the broader categorization of renal tissue into non-neoplastic normal tissue, benign tumor and malignant tumor remains understudied.

View Article and Find Full Text PDF

Self-interactive learning: Fusion and evolution of multi-scale histomorphology features for molecular traits prediction in computational pathology.

Med Image Anal

January 2025

Nuffield Department of Medicine, University of Oxford, Oxford, UK; Department of Engineering Science, University of Oxford, Oxford, UK; Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK; Ludwig Institute for Cancer Research, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, UK; Oxford National Institute for Health Research (NIHR) Biomedical Research Centre, Oxford, UK. Electronic address:

Predicting disease-related molecular traits from histomorphology brings great opportunities for precision medicine. Despite the rich information present in histopathological images, extracting fine-grained molecular features from standard whole slide images (WSI) is non-trivial. The task is further complicated by the lack of annotations for subtyping and contextual histomorphological features that might span multiple scales.

View Article and Find Full Text PDF

Colorectal cancer classification using weakly annotated whole slide images: Multiple instance learning optimization study.

Comput Biol Med

January 2025

Computer and Systems Engineering Department, Faculty of Engineering, Alexandria University, Alexandria, Egypt. Electronic address:

Colorectal cancer (CRC) is considered one of the most deadly cancer types nowadays. It is rapidly increasing due to many factors, such as unhealthy lifestyles, water and food pollution, aging, and medical diagnosis development. Detecting CRC in its early stages can help stop its growth by providing the necessary treatments, thereby saving many people's lives.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!