Using recursive feature elimination in random forest to account for correlated variables in high dimensional data.

BMC Genet

Department of Population Health Sciences, School of Medicine and Public Health, University of Wisconsin, 610 Walnut Street, 1007 WARF, Madison, WI, 53726, USA.

Published: September 2018

Background: Random forest (RF) is a machine-learning method that generally works well with high-dimensional problems and allows for nonlinear relationships between predictors; however, the presence of correlated predictors has been shown to impact its ability to identify strong predictors. The Random Forest-Recursive Feature Elimination algorithm (RF-RFE) mitigates this problem in smaller data sets, but this approach has not been tested in high-dimensional omics data sets.

Results: We integrated 202,919 genotypes and 153,422 methylation sites in 680 individuals, and compared the abilities of RF and RF-RFE to detect simulated causal associations, which included simulated genotype-methylation interactions, between these variables and triglyceride levels. Results show that RF was able to identify strong causal variables with a few highly correlated variables, but it did not detect other causal variables.

Conclusions: Although RF-RFE decreased the importance of correlated variables, in the presence of many correlated variables, it also decreased the importance of causal variables, making both hard to detect. These findings suggest that RF-RFE may not scale to high-dimensional data.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6157185PMC
http://dx.doi.org/10.1186/s12863-018-0633-8DOI Listing

Publication Analysis

Top Keywords

correlated variables
16
feature elimination
8
random forest
8
presence correlated
8
identify strong
8
causal variables
8
variables
7
correlated
5
recursive feature
4
elimination random
4

Similar Publications

Aim: To investigate the detection and initial management of first psychotic episodes, as well as established schizophrenia, within the primary care of the Andalusian Health System.

Background: Delay in detecting and treating psychosis is associated with slower recovery, higher relapse risk, and poorer long-term outcomes. Often, psychotic episodes go unnoticed for years before a diagnosis is established.

View Article and Find Full Text PDF

Prediction of dry matter intake in growing Black Bengal goats using artificial neural networks.

Trop Anim Health Prod

January 2025

Livestock Production and Management Section, ICAR-Indian Veterinary Research Institute, Izatnagar, Bareilly, Uttar Pradesh, 243 122, India.

Dry matter intake (DMI) determination is essential for effective management of meat goats, especially in optimizing feed utilization and production efficiency. Unfortunately, farmers often face challenges in accurately predicting DMI which leads to wastage of feed and an increase in the cost of production. This investigation aimed to predict DMI in Black Bengal goats by using body weight (BW), body condition score (BCS), average daily gain (ADG), and metabolic body weight (MBW) by applying an artificial neural network (ANN) model.

View Article and Find Full Text PDF

Distinct seasonality of nutrients in twigs and leaves of temperate trees.

Tree Physiol

January 2025

School of Natural Resources, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China.

Seasonal variations of nutrients in different organs are an essential strategy for temperate trees to maintain growth and function. The seasonal variations and variability (i.e.

View Article and Find Full Text PDF

Background And Objectives: Social isolation is an increasing public health concern. Older residents in subsidized housing may be susceptible to isolation given high rates of chronic illness/disabilities, low income, and living alone. This cross-sectional study examined correlates of social isolation among over 3,000 older adults from nearly 100 subsidized housing communities across the US.

View Article and Find Full Text PDF

Multidimensional Classification and Prediction of Outcome Following Traumatic Brain Injury.

J Head Trauma Rehabil

January 2025

Author Affiliations: Monash-Epworth Rehabilitation Research Centre, School of Psychological Sciences, Monash University, Melbourne, Victoria, Australia (Prof Ponsford and Drs Spitz, Pyman, Carrier, Hicks, and Nguyen); Department of Neuroscience, Central Clinical School, Monash University, Melbourne, Victoria, Australia (Dr Spitz); TIRR Memorial Hermann Research Center Houston, Texas (Drs Sander and Sherer); and H. Ben Taub Department of Physical Medicine and Rehabilitation, Baylor College of Medicine & Harris Health System, Houston, Texas (Drs Sander and Sherer).

Objectives: This study aimed to identify outcome clusters among individuals with traumatic brain injury (TBI), 6 months to 10 years post-injury, in an Australian rehabilitation sample, and determine whether scores on 12 dimensions, combined with demographic and injury severity variables, could predict outcome cluster membership 1 to 3 years post-injury.

Setting: Rehabilitation hospital.

Participants: A total of 467 individuals with TBI, aged 17 to 87 (M = 44.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!