Machine learning methods provide a powerful approach for analyzing longitudinal data in which repeated measurements are observed for a subject over time. We boost multivariate trees to fit a novel flexible semi-nonparametric marginal model for longitudinal data. In this model, features are assumed to be nonparametric, while feature-time interactions are modeled semi-nonparametrically utilizing -splines with estimated smoothing parameter.
View Article and Find Full Text PDFWe introduce a new approach to competing risks using random forests. Our method is fully non-parametric and can be used for selecting event-specific variables and for estimating the cumulative incidence function. We show that the method is highly effective for both prediction and variable selection in high-dimensional problems and in settings such as HIV/AIDS that involve many competing risks.
View Article and Find Full Text PDFBACKGROUND- Simultaneous contribution of hundreds of electrocardiographic (ECG) biomarkers to prediction of long-term mortality in postmenopausal women with clinically normal resting ECGs is unknown. METHODS AND RESULTS- We analyzed ECGs and all-cause mortality in 33 144 women enrolled in the Women's Health Initiative trials who were without baseline cardiovascular disease or cancer and had normal ECGs by Minnesota and Novacode criteria. Four hundred and seventy-seven ECG biomarkers, encompassing global and individual ECG findings, were measured with computer algorithms.
View Article and Find Full Text PDFWe prove uniform consistency of Random Survival Forests (RSF), a newly introduced forest ensemble learner for analysis of right-censored survival data. Consistency is proven under general splitting rules, bootstrapping, and random selection of variables-that is, under true implementation of the methodology. Under this setting we show that the forest ensemble survival function converges uniformly to the true population survival function.
View Article and Find Full Text PDFBMC Bioinformatics
February 2006
Background: DNA microarrays open up a new horizon for studying the genetic determinants of disease. The high throughput nature of these arrays creates an enormous wealth of information, but also poses a challenge to data analysis. Inferential problems become even more pronounced as experimental designs used to collect data become more complex.
View Article and Find Full Text PDF