We experiment with recent ensemble machine learning methods in estimating healthcare costs, utilizing Finnish data containing rich individual-level information on healthcare costs, socioeconomic status and diagnostic data from multiple registries. Our data are a random 10% sample (553,675 observations) from the Finnish population in 2017. Using annual healthcare cost in 2017 as a response variable, we compare the performance of Random forest, Gradient Boosting Machine (GBM) and eXtreme Gradient Boosting (XGBoost) to linear regression. As machine learning methods are often seen as unsuitable in risk adjustment applications because of their relative opaqueness, we also introduce visualizations from the machine learning literature to help interpret the contribution of individual variables to the prediction. Our results show that ensemble machine learning methods can improve predictive performance, with all of them significantly outperforming linear regression, and that a certain level of interpretation can be provided for them. We also find individual-level socioeconomic variables to improve prediction accuracy and that their effect is larger for machine learning methods. However, we find that the predictions used for funding allocations are sensitive to model selection, highlighting the need for comprehensive robustness testing when estimating risk adjustment models used in applications.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11377675PMC
http://dx.doi.org/10.1007/s10198-023-01656-wDOI Listing

Publication Analysis

Top Keywords

machine learning
20
learning methods
16
risk adjustment
12
funding allocations
8
ensemble machine
8
healthcare costs
8
gradient boosting
8
linear regression
8
machine
6
methods
5

Similar Publications

Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models.

Genet Med

December 2024

Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN; Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN; Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN. Electronic address:

Purpose: The value of genetic information for improving the performance of clinical risk prediction models has yielded variable conclusions. Many methodological decisions have the potential to contribute to differential results. We performed multiple modeling experiments integrating clinical and demographic data from electronic health records (EHR) with genetic data to understand which decisions may affect performance.

View Article and Find Full Text PDF

Optimizing T cell inflamed signature through a combination biomarker approach for predicting immunotherapy response in NSCLC.

Sci Rep

December 2024

Interventional Oncology, Johnson & Johnson Enterprise Innovation, Inc, 10th Floor 255 Main St, 02142, Cambridge, Boston, MA, USA.

The introduction of anti-PD-1/PD-L1 therapies revolutionized treatment for advanced non-small cell lung cancer (NSCLC), yet response rates remain modest, underscoring the need for predictive biomarkers. While a T cell inflamed gene expression profile (GEP) has predicted anti-PD-1 response in various cancers, it failed in a large NSCLC cohort from the Stand Up To Cancer-Mark (SU2C-MARK) Foundation. Re-analysis revealed that while the T cell inflamed GEP alone was not predictive, its performance improved significantly when combined with gene signatures of myeloid cell markers.

View Article and Find Full Text PDF

This study aimed to explore a deep learning radiomics (DLR) model based on grayscale ultrasound images to assist radiologists in distinguishing between benign breast lesions (BBL) and malignant breast lesions (MBL). A total of 382 patients with breast lesions were included, comprising 183 benign lesions and 199 malignant lesions that were collected and confirmed through clinical pathology or biopsy. The enrolled patients were randomly allocated into two groups: a training cohort and an independent test cohort, maintaining a ratio of 7:3.

View Article and Find Full Text PDF

This paper presents a slot antenna integrated with a split ring resonator (SRR) and feed line, designed to achieve a high Q-factor while maximizing channel capacity utilization. By incorporating a lens into the dielectric resonator antenna (DRA), we enhance both bandwidth and directivity, with the dielectric material's permittivity serving as a key control parameter for radiation characteristics. We explore water and ethanol as controllable dielectrics within the terahertz (THz) frequency range (0.

View Article and Find Full Text PDF

Osteosarcoma (OS) is the most prevalent secondary sarcoma associated with retinoblastoma (RB). However, the molecular mechanisms driving the interactions between these two diseases remain incompletely understood. This study aims to explore the transcriptomic commonalities and molecular pathways shared by RB and OS, and to identify biomarkers that predict OS prognosis effectively.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!