Fairness gaps in Machine learning models for hospitalization and emergency department visit risk prediction in home healthcare patients with heart failure.

Anahita Davoudi Sena Chae Lauren Evans Sridevi Sridharan Jiyoun Song Kathryn H Bowles Margaret V McDonald Maxim Topaz

Int J Med Inform

Center for Home Care Policy & Research, VNS Health, New York, New York, USA; School of Nursing, Columbia University, New York City, NY, USA; Data Science Institute, Columbia University, New York City, New York, USA.

Published: November 2024

Objectives: This study aims to evaluate the fairness performance metrics of Machine Learning (ML) models to predict hospitalization and emergency department (ED) visits in heart failure patients receiving home healthcare. We analyze biases, assess performance disparities, and propose solutions to improve model performance in diverse subpopulations.

Methods: The study used a dataset of 12,189 episodes of home healthcare collected between 2015 and 2017, including structured (e.g., standard assessment tool) and unstructured data (i.e., clinical notes). ML risk prediction models, including Light Gradient-boosting model (LightGBM) and AutoGluon, were developed using demographic information, vital signs, comorbidities, service utilization data, and the area deprivation index (ADI) associated with the patient's home address. Fairness metrics, such as Equal Opportunity, Predictive Equality, Predictive Parity, and Statistical Parity, were calculated to evaluate model performance across subpopulations.

Results: Our study revealed significant disparities in model performance across diverse demographic subgroups. For example, the Hispanic, Male, High-ADI subgroup excelled in terms of Equal Opportunity with a metric value of 0.825, which was 28% higher than the lowest-performing Other, Female, Low-ADI subgroup, which scored 0.644. In Predictive Parity, the gap between the highest and lowest-performing groups was 29%, and in Statistical Parity, the gap reached 69%. In Predictive Equality, the difference was 45%.

Discussion And Conclusion: The findings highlight substantial differences in fairness metrics across diverse patient subpopulations in ML risk prediction models for heart failure patients receiving home healthcare services. Ongoing monitoring and improvement of fairness metrics are essential to mitigate biases.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.ijmedinf.2024.105534	DOI Listing

Publication Analysis

Top Keywords

risk prediction

heart failure

model performance

fairness metrics

machine learning

learning models

hospitalization emergency

emergency department

failure patients

patients receiving

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!