Nearest-neighbor Projected-Distance Regression (NPDR) for detecting network interactions with adjustments for multiple tests and confounding.

Trang T Le Bryan A Dawkins Brett A McKinney

Bioinformatics

Department of Mathematics.

Published: May 2020

Machine learning feature selection methods are essential for analyzing complex data from studies like GWAS and neuroimaging, especially when dealing with high-dimensional datasets and controlling false discoveries.
The newly developed Nearest-neighbor Projected-Distance Regression (NPDR) technique improves upon traditional methods by effectively measuring predictor importance and managing covariate adjustments for both binary and continuous outcomes.
Simulations and real RNA-Seq data from major depressive disorder show that NPDR outperforms existing feature selection methods in accuracy, particularly in filtering out false associations and identifying significant genetic interactions.

Summary: Machine learning feature selection methods are needed to detect complex interaction-network effects in complicated modeling scenarios in high-dimensional data, such as GWAS, gene expression, eQTL and structural/functional neuroimage studies for case-control or continuous outcomes. In addition, many machine learning methods have limited ability to address the issues of controlling false discoveries and adjusting for covariates. To address these challenges, we develop a new feature selection technique called Nearest-neighbor Projected-Distance Regression (NPDR) that calculates the importance of each predictor using generalized linear model regression of distances between nearest-neighbor pairs projected onto the predictor dimension. NPDR captures the underlying interaction structure of data using nearest-neighbors in high dimensions, handles both dichotomous and continuous outcomes and predictor data types, statistically corrects for covariates, and permits statistical inference and penalized regression. We use realistic simulations with interactions and other effects to show that NPDR has better precision-recall than standard Relief-based feature selection and random forest importance, with the additional benefit of covariate adjustment and multiple testing correction. Using RNA-Seq data from a study of major depressive disorder (MDD), we show that NPDR with covariate adjustment removes spurious associations due to confounding. We apply NPDR to eQTL data to identify potentially interacting variants that regulate transcripts associated with MDD and demonstrate NPDR's utility for GWAS and continuous outcomes.

Availability And Implementation: Available at: https://insilico.github.io/npdr/.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8453237	PMC
http://dx.doi.org/10.1093/bioinformatics/btaa024	DOI Listing

Publication Analysis

Top Keywords

feature selection

nearest-neighbor projected-distance

projected-distance regression

regression npdr

machine learning

continuous outcomes

covariate adjustment

npdr

data

regression

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!