Identification of disease subtypes and corresponding biomarkers can substantially improve clinical diagnosis and treatment selection. Discovering these subtypes in noisy, high dimensional biomedical data is often impossible for humans and challenging for machines. We introduce a new approach to facilitate the discovery of disease subtypes: Instead of analyzing the original data, we train a diagnostic classifier (healthy vs. diseased) and extract instance-wise explanations for the classifier's decisions. The distribution of instances in the explanation space of our diagnostic classifier amplifies the different reasons for belonging to the same class-resulting in a representation that is uniquely useful for discovering latent subtypes. We compare our ability to recover subtypes via cluster analysis on model explanations to classical cluster analysis on the original data. In multiple datasets with known ground-truth subclasses, particularly on UK Biobank brain imaging data and transcriptome data from the Cancer Genome Atlas, we show that cluster analysis on model explanations substantially outperforms the classical approach. While we believe clustering in explanation space to be particularly valuable for inferring disease subtypes, the method is more general and applicable to any kind of sub-type identification.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7393364PMC
http://dx.doi.org/10.1038/s41598-020-68858-7DOI Listing

Publication Analysis

Top Keywords

disease subtypes
16
explanation space
12
cluster analysis
12
inferring disease
8
original data
8
diagnostic classifier
8
analysis model
8
model explanations
8
subtypes
7
data
5

Similar Publications

Emerging biomarkers in Gaucher disease.

Adv Clin Chem

January 2025

Center for Orphan Drug Research, Department of Experimental and Clinical Pharmacology, College of Pharmacy, University of Minnesota, Minneapolis, MN, United States. Electronic address:

Gaucher disease (GD) is a rare lysosomal disorder characterized by the accumulation of glycosphingolipids in macrophages resulting from glucocerebrosidase (GCase) deficiency. The accumulation of toxic substrates, which causes the hallmark symptoms of GD, is dependent on the extent of enzyme dysfunction. Accordingly, three distinct subtypes have been recognized, with type 1 GD (GD1) as the common and milder form, while types 2 (GD2) and 3 (GD3) are categorized as neuronopathic and severe.

View Article and Find Full Text PDF

This study utilizes single-cell RNA sequencing data to reveal the transcriptomic characteristics of breast cancer and normal epithelial cells. Nine significant cell populations were identified through stringent quality control and batch effect correction. Further classification of breast cancer epithelial cells based on the PAM50 method and clinical subtypes highlighted significant heterogeneity between triple-negative breast cancer (TNBC) and non-triple-negative breast cancer (NTNBC).

View Article and Find Full Text PDF

Cervicovaginal microbiome and natural history of Chlamydia trachomatis in adolescents and young women.

Cell

January 2025

Departments of Microbiology and Immunology, Albert Einstein College of Medicine, Bronx, New York, NY, USA; Department of Pediatrics (Genetic Medicine), Albert Einstein College of Medicine, Bronx, New York, NY, USA; Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, New York, NY, USA; Department Obstetrics and Gynecology and Women's Health, Albert Einstein College of Medicine, Bronx, New York, NY, USA. Electronic address:

This study investigated the cervicovaginal microbiome's (CVM's) impact on Chlamydia trachomatis (CT) infection among Black and Hispanic adolescent and young adult women. A total of 187 women with incident CT were matched to 373 controls, and the CVM was characterized before, during, and after CT infection. The findings highlight that a specific subtype of bacterial vaginosis (BV), identified from 16S rRNA gene reads using the molBV algorithm and community state type (CST) clustering, is a significant risk factor for CT acquisition.

View Article and Find Full Text PDF

Isoleucine at position 137 of Hemagglutinin acts as a Mammalian adaptation marker of H9N2 Avian influenza virus.

Emerg Microbes Infect

January 2025

Key Laboratory of Livestock Infectious Diseases, Ministry of Education, Key Laboratory of Zoonosis, College of Animal Science and Veterinary Medicine, Liaoning Panjin Wetland Ecosystem National Observation and Research Station, Shenyang Agricultural University, Shenyang, People's Republic of China.

The H9N2 subtype of avian influenza virus (AIV) is widely distributed among poultry and wild birds and is also a threat to humans. During AIV active surveillance in Liaoning province from 2015 to 2016, we identified ten H9N2 strains exhibiting different lethality to chick embryos. Two representative strains, A/chicken/China/LN07/2016 (CKLN/07) and A/chicken/China/LN17/2016 (CKLN/17), with similar genomic background but different chick embryo lethality, were chosen to evaluate the molecular basis for this difference.

View Article and Find Full Text PDF

Distinct molecular subtypes of muscle-invasive bladder cancer (MIBC) may show different platinum sensitivities. Currently available data were mostly generated at transcriptome level and have limited comparability to each other. We aimed to determine the platinum sensitivity of molecular subtypes by using the protein expression-based Lund Taxonomy.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!