Calibrating random forests for probability estimation.

Stat Med

Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Lübeck, Germany.

Published: September 2016

Probabilities can be consistently estimated using random forests. It is, however, unclear how random forests should be updated to make predictions for other centers or at different time points. In this work, we present two approaches for updating random forests for probability estimation. The first method has been proposed by Elkan and may be used for updating any machine learning approach yielding consistent probabilities, so-called probability machines. The second approach is a new strategy specifically developed for random forests. Using the terminal nodes, which represent conditional probabilities, the random forest is first translated to logistic regression models. These are, in turn, used for re-calibration. The two updating strategies were compared in a simulation study and are illustrated with data from the German Stroke Study Collaboration. In most simulation scenarios, both methods led to similar improvements. In the simulation scenario in which the stricter assumptions of Elkan's method were not met, the logistic regression-based re-calibration approach for random forests outperformed Elkan's method. It also performed better on the stroke data than Elkan's method. The strength of Elkan's method is its general applicability to any probability machine. However, if the strict assumptions underlying this approach are not met, the logistic regression-based approach is preferable for updating random forests for probability estimation. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5074325PMC
http://dx.doi.org/10.1002/sim.6959DOI Listing

Publication Analysis

Top Keywords

random forests
28
elkan's method
16
forests probability
12
probability estimation
12
updating random
8
met logistic
8
logistic regression-based
8
forests
7
random
7
probability
5

Similar Publications

Importance: Radiotherapy (RT) plan quality is an established predictive factor associated with cancer recurrence and survival outcomes. The addition of radiologists to the peer review (PR) process may increase RT plan quality.

Objective: To determine the rate of changes to the RT plan with and without radiology involvement in PR of radiation targets.

View Article and Find Full Text PDF

Groundwater is an essential freshwater source worldwide, but increasing pollution poses risks to its sustainability. This study applied a comprehensive approach to assess hydrogeochemical facies and groundwater quality in Odisha's large low-lying coastal regions. Analysis of 136 samples revealed that sodium (9.

View Article and Find Full Text PDF

Poultry represents a rich source of multiple nutrients. Refrigeration is commonly employed for poultry preservation, although extended storage duration can adversely affect the meat quality. Current research on this topic has focused on the analysis of biochemical indices in chilled goose meat, with limited information on changes in metabolites that influence the quality of the meat during storage.

View Article and Find Full Text PDF

Abdominal aortic aneurysm (AAA) is a life-threatening condition characterized by the weakening and dilation of the abdominal aorta. Few diagnostic biomarkers have been proposed for this condition. We performed mass spectrometry-based proteomics analysis of affinity-enriched plasma from 45 patients with AAA and 45 matched controls to identify changes to the plasma proteome and potential diagnostic biomarkers.

View Article and Find Full Text PDF

Machine Learning for Predicting Zearalenone Contamination Levels in Pet Food.

Toxins (Basel)

December 2024

Key Laboratory of Feed Biotechnology, Ministry of Agriculture and Rural Affairs, Institute of Feed Research, Chinese Academy of Agricultural Sciences, No. 12 Zhongguancun South Street, Beijing 100081, China.

Zearalenone (ZEN) has been detected in both pet food ingredients and final products, causing acute toxicity and chronic health problems in pets. Therefore, the early detection of mycotoxin contamination in pet food is crucial for ensuring the safety and well-being of animals. This study aims to develop a rapid and cost-effective method using an electronic nose (E-nose) and machine learning algorithms to predict whether ZEN levels in pet food exceed the regulatory limits (250 µg/kg), as set by Chinese pet food legislation.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!