Predicting the time to get back to work using statistical models and machine learning approaches.

BMC Med Res Methodol

Høyskolen Kristiania, Oslo, Norway.

Published: November 2024

The study compares machine learning approaches to traditional statistical models for survival analysis, specifically in predicting time to return to work for families with complex issues.
The results show that no model clearly outperformed the others, with both machine learning and classical models displaying low predictive power.
While machine learning approaches exhibited better fit metrics, particularly the Random Survival Forest, this did not lead to significantly improved predictive accuracy over classical methods, indicating the need for further refinement of these algorithms.

Background: Whether machine learning approaches are superior to classical statistical models for survival analyses, especially in the case of lack of proportionality, is unknown.

Objectives: To compare model performance and predictive accuracy of classic regressions and machine learning approaches using data from the Inspiring Families programme.

Methods: The Inspiring Families programme aims to support members of families with complex issues to return to work. We explored predictors of time to return to work with proportional hazards (Semi-Parametric Cox in Stata) and (Flexible Parametric Parmar-Royston in Stata) against the Survival penalised regression with Elastic Net penalty (scikit-survival), (conditional) Survival Forest algorithm (pySurvival), and (kernel) Survival Support Vector Machine (pySurvival).

Results: At baseline we obtained data on 61 binary variables from all 3161 participants. No model appeared superior, with a low predictive power (concordance index between 0.51 and 0.61). The median time for finding the first job was about 254 days. The top five contributing variables were 'family issues and additional barriers', 'restriction of hours', 'available CV', 'self-employment considered' and 'education'. The Harrell's Concordance index was range from 0.60 (Cox model) to 0.71 (Random Survival Forest) suggesting a better fit for the machine learning approaches. However, the comparison for predicting median time on a selected scenario based showed only minor differences.

Conclusion: Implementing a series of survival models with and without proportional hazards background provides a useful insight as well as better interpretation of the coefficients affected by non-linearities. However, that better fit does not translate to substantially higher predictive power and accuracy from using machine learning approaches. Further tuning of the machine learning algorithms may provide improved results.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11606207	PMC
http://dx.doi.org/10.1186/s12874-024-02390-4	DOI Listing

Publication Analysis

Top Keywords

machine learning

learning approaches

statistical models

inspiring families

return work

proportional hazards

survival forest

predictive power

median time

better fit

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!

A PHP Error was encountered