Improved high-dimensional prediction with Random Forests by the use of co-data.

BMC Bioinformatics

Department of Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, 1007 MB, The Netherlands.

Published: December 2017

Background: Prediction in high dimensional settings is difficult due to the large number of variables relative to the sample size. We demonstrate how auxiliary 'co-data' can be used to improve the performance of a Random Forest in such a setting.

Results: Co-data are incorporated in the Random Forest by replacing the uniform sampling probabilities that are used to draw candidate variables by co-data moderated sampling probabilities. Co-data here are defined as any type information that is available on the variables of the primary data, but does not use its response labels. These moderated sampling probabilities are, inspired by empirical Bayes, learned from the data at hand. We demonstrate the co-data moderated Random Forest (CoRF) with two examples. In the first example we aim to predict the presence of a lymph node metastasis with gene expression data. We demonstrate how a set of external p-values, a gene signature, and the correlation between gene expression and DNA copy number can improve the predictive performance. In the second example we demonstrate how the prediction of cervical (pre-)cancer with methylation data can be improved by including the location of the probe relative to the known CpG islands, the number of CpG sites targeted by a probe, and a set of p-values from a related study.

Conclusion: The proposed method is able to utilize auxiliary co-data to improve the performance of a Random Forest.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5745983PMC
http://dx.doi.org/10.1186/s12859-017-1993-1DOI Listing

Publication Analysis

Top Keywords

random forest
16
sampling probabilities
12
improve performance
8
performance random
8
co-data moderated
8
moderated sampling
8
gene expression
8
co-data
6
random
5
improved high-dimensional
4

Similar Publications

Health extension workers job satisfaction and associated factors in Ethiopia: a systematic review and meta-analysis.

BMC Health Serv Res

January 2025

Amref Health Africa in Ethiopia, EPI Technical Assistant at West Gondar Zonal Health Department, SLL Project, COVID-19 Vaccine, Gondar, Ethiopia.

Background: Ethiopian healthcare relies heavily on Health Extension Workers (HEWs), who deliver essential services to communities nationwide. By analyzing existing research, the authors explore how prevalent job satisfaction is and what factors affect it. This comprehensive analysis aims to improve HEW satisfaction through targeted interventions, ultimately leading to a more effective healthcare workforce and better health outcomes in Ethiopia.

View Article and Find Full Text PDF

Optical techniques, such as functional near-infrared spectroscopy (fNIRS), contain high potential for the development of non-invasive wearable systems for evaluating cerebral vascular condition in aging, due to their portability and ability to monitor real-time changes in cerebral hemodynamics. In this study, thirty-six healthy adults were measured by single channel fNIRS to explore differences between two age groups using machine learning (ML). The subjects, measured during functional magnetic resonance imaging (fMRI) at Oulu University Hospital, were divided into young (age ≤ 32) and elderly (age ≥ 57) groups.

View Article and Find Full Text PDF

Diabetes is a growing health concern in developing countries, causing considerable mortality rates. While machine learning (ML) approaches have been widely used to improve early detection and treatment, several studies have shown low classification accuracies due to overfitting, underfitting, and data noise. This research employs parallel and sequential ensemble ML approaches paired with feature selection techniques to boost classification accuracy.

View Article and Find Full Text PDF

Osteoporosis is the most common bone metabolic unbalance, leading to fragility fractures, which are known to be associated with structural changes in the bone. Cortical bone accounts for 80 % of the skeleton mass and undergoes remodeling throughout life, leading to changes in its thickness and microstructure. Although many studies quantified the different cortical bone structures using CT techniques (3D), they are often realised on a small number of samples.

View Article and Find Full Text PDF

Objective: We aimed to develop a highly interpretable and effective, machine-learning based risk prediction algorithm to predict in-hospital mortality, intubation and adverse cardiovascular events in patients hospitalised with COVID-19 in Australia (AUS-COVID Score).

Materials And Methods: This prospective study across 21 hospitals included 1714 consecutive patients aged ≥ 18 in their index hospitalization with COVID-19. The dataset was separated into training (80%) and test sets (20%).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!