Forward variable selection for random forest models.

J Appl Stat

Department of Applied Mathematics, Delft University of Technology, Delft, The Netherlands.

Published: July 2022

Random forest is a popular prediction approach for handling high dimensional covariates. However, it often becomes infeasible to interpret the obtained high dimensional and non-parametric model. Aiming for an interpretable predictive model, we develop a forward variable selection method using the continuous ranked probability score (CRPS) as the loss function. eOur stepwise procedure selects at each step a variable that minimizes the CRPS risk and a stopping criterion for selection is designed based on an estimation of the CRPS risk difference of two consecutive steps. We provide mathematical motivation for our method by proving that in a population sense, the method attains the optimal set. In a simulation study, we compare the performance of our method with an existing variable selection method, for different sample sizes and correlation strength of covariates. Our method is observed to have a much lower false positive rate. We also demonstrate an application of our method to statistical post-processing of daily maximum temperature forecasts in the Netherlands. Our method selects about 10% covariates while retaining the same predictive power.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10503461PMC
http://dx.doi.org/10.1080/02664763.2022.2095362DOI Listing

Publication Analysis

Top Keywords

variable selection
12
forward variable
8
random forest
8
high dimensional
8
method
8
selection method
8
crps risk
8
selection
4
selection random
4
forest models
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!