The revival of the Gini importance?

Bioinformatics

Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Lübeck, Germany.

Published: November 2018

Motivation: Random forests are fast, flexible and represent a robust approach to analyze high dimensional data. A key advantage over alternative machine learning algorithms are variable importance measures, which can be used to identify relevant features or perform variable selection. Measures based on the impurity reduction of splits, such as the Gini importance, are popular because they are simple and fast to compute. However, they are biased in favor of variables with many possible split points and high minor allele frequency.

Results: We set up a fast approach to debias impurity-based variable importance measures for classification, regression and survival forests. We show that it creates a variable importance measure which is unbiased with regard to the number of categories and minor allele frequency and almost as fast as the standard impurity importance. As a result, it is now possible to compute reliable importance estimates without the extra computing cost of permutations. Further, we combine the importance measure with a fast testing procedure, producing p-values for variable importance with almost no computational overhead to the creation of the random forest. Applications to gene expression and genome-wide association data show that the proposed method is powerful and computationally efficient.

Availability And Implementation: The procedure is included in the ranger package, available at https://cran.r-project.org/package=ranger and https://github.com/imbs-hl/ranger.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6198850PMC
http://dx.doi.org/10.1093/bioinformatics/bty373DOI Listing

Publication Analysis

Top Keywords

variable measures
8
minor allele
8
fast
5
variable
5
revival gini
4
gini importance?
4
importance? motivation
4
motivation random
4
random forests
4
forests fast
4

Similar Publications

Aim: We applied the Institute of Medicine (IOM) definition of racial and ethnic disparities in healthcare to estimate disparities in alcohol-related problems. This estimation involved adjusting for drinking patterns, gender and age, with observed disparities further explained by socioeconomic status (SES). We compared results of five statistical approaches which use different methods for adjusting covariates.

View Article and Find Full Text PDF

Citizen science has been increasingly utilized for monitoring resource conditions and visitor use in protected areas. However, the quality of data provided by citizen scientists remains a major concern that hinders wider applications in protected area management. We evaluated a prototype, citizen science-based trail assessment and monitoring program in Hong Kong using an integrated evaluative approach with a specific focus on the congruence of data collected by trained volunteers and managers.

View Article and Find Full Text PDF

Background: In recent years, the life expectancy of HIV patients has increased due to the introduction and development of antiretroviral therapies. However, although it has become a chronic pathology, the patients present a higher metabolic, hepatic, and renal risk and a greater aging than the general population.

Objective: To identify the main factors associated with clinical alterations in patients with HIV.

View Article and Find Full Text PDF

Purpose: The aim of this study was to propose a lateral oscillating device for the prevention of pressure ulcers by understanding the mechanisms of tissue protection in healthy individuals during prolonged decubitus. We also sought to determine the optimal time interval for oscillation, considering peak pressure peaks and tolerable pressure limits as a function of individual characteristics such as age, weight, height, gender, and BMI.

Methods: A quasi-experimental, descriptive and analytical observational study was conducted between January 2022 and June 2023 with a sample of 25 healthy volunteers.

View Article and Find Full Text PDF

Aim: Pre-injury frailty has been investigated as a tool to predict outcomes of older trauma patients. Using artificial intelligence principles of machine learning, we aimed to identify a "signature" (combination of clinical variables) that could predict which older adults are at risk of fall-related hospital admission. We hypothesized that frailty, measured using the 5-item modified Frailty Index, could be utilized in combination with other factors as a predictor of admission for fall-related injuries.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!