Personalized regression enables sample-specific pan-cancer analysis.

Bioinformatics

Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, USA.

Published: July 2018

Motivation: In many applications, inter-sample heterogeneity is crucial to understanding the complex biological processes under study. For example, in genomic analysis of cancers, each patient in a cohort may have a different driver mutation, making it difficult or impossible to identify causal mutations from an averaged view of the entire cohort. Unfortunately, many traditional methods for genomic analysis seek to estimate a single model which is shared by all samples in a population, ignoring this inter-sample heterogeneity entirely. In order to better understand patient heterogeneity, it is necessary to develop practical, personalized statistical models.

Results: To uncover this inter-sample heterogeneity, we propose a novel regularizer for achieving patient-specific personalized estimation. This regularizer operates by learning two latent distance metrics-one between personalized parameters and one between clinical covariates-and attempting to match the induced distances as closely as possible. Crucially, we do not assume these distance metrics are already known. Instead, we allow the data to dictate the structure of these latent distance metrics. Finally, we apply our method to learn patient-specific, interpretable models for a pan-cancer gene expression dataset containing samples from more than 30 distinct cancer types and find strong evidence of personalization effects between cancer types as well as between individuals. Our analysis uncovers sample-specific aberrations that are overlooked by population-level methods, suggesting a promising new path for precision analysis of complex diseases such as cancer.

Availability And Implementation: Software for personalized linear and personalized logistic regression, along with code to reproduce experimental results, is freely available at github.com/blengerich/personalized_regression.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022603PMC
http://dx.doi.org/10.1093/bioinformatics/bty250DOI Listing

Publication Analysis

Top Keywords

inter-sample heterogeneity
12
genomic analysis
8
latent distance
8
distance metrics
8
cancer types
8
personalized
6
analysis
5
personalized regression
4
regression enables
4
enables sample-specific
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!