Survival analysis is critical in many fields, particularly in healthcare where it can guide medical decisions. Conventional survival analysis methods like Kaplan-Meier and Cox proportional hazards models to generate survival curves indicating probability of survival v. time have limitations, especially for long-term prediction, due to assumptions that all instances follow a general population-level survival curve. Machine learning classification models, even those designed for survival predictions like random survival forest (RSF), also struggle to provide accurate long-term predictions due to class imbalance. We improve upon traditional survival machine learning approaches through a novel framework called classification-augmented survival estimation (CASE), which treats survival as a classification task that ultimately yields survival curves, beginning with dataset augmentation to improve class imbalance for use with any classification model. Unlike other approaches, CASE additionally provides an exact survival time prediction. We demonstrate CASE on a liver transplant case study to predict >20 years survival post-transplant, finding that CASE dataset augmentation improved AUCs from 0.69 to 0.88 and F1 scores from 0.32 to 0.73. Compared to Kaplan-Meier, Cox, and RSF survival models, the CASE framework demonstrated better performance across various existing survival metrics, as well as our novel metric, mean of individual areas under the survival curve (mAUSC). Further, we develop novel temporal feature importance methods to understand how different features may vary in survival importance over time, potentially providing actionable insights in real-world survival problems.
Download full-text PDF |
Source |
---|---|
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0315928 | PLOS |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!