A variable selection approach for highly correlated predictors in high-dimensional genomic data.

Bioinformatics

Biostatistics and Programming Department, Sanofi R&D, Chilly Mazarin 91380, France.

Published: August 2021

Motivation: In genomic studies, identifying biomarkers associated with a variable of interest is a major concern in biomedical research. Regularized approaches are classically used to perform variable selection in high-dimensional linear models. However, these methods can fail in highly correlated settings.

Results: We propose a novel variable selection approach called WLasso, taking these correlations into account. It consists in rewriting the initial high-dimensional linear model to remove the correlation between the biomarkers (predictors) and in applying the generalized Lasso criterion. The performance of WLasso is assessed using synthetic data in several scenarios and compared with recent alternative approaches. The results show that when the biomarkers are highly correlated, WLasso outperforms the other approaches in sparse high-dimensional frameworks. The method is also illustrated on publicly available gene expression data in breast cancer.

Availabilityand Implementation: Our method is implemented in the WLasso R package which is available from the Comprehensive R Archive Network (CRAN).

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btab114DOI Listing

Publication Analysis

Top Keywords

variable selection
12
highly correlated
12
selection approach
8
high-dimensional linear
8
variable
4
approach highly
4
correlated predictors
4
high-dimensional
4
predictors high-dimensional
4
high-dimensional genomic
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!