Age, sex and race bias in automated arrhythmia detectors.

Erick A Perez Alday Ali B Rad Matthew A Reyna Nadi Sadr Annie Gu Qiao Li Mircea Dumitru Joel Xue Dave Albert Reza Sameni Gari D Clifford

J Electrocardiol

Department of Biomedical Informatics, School of Medicine, Emory Uni versity, United States of America; Department of Biomedical Engineering, Georgia Institute of Technology, United States of America.

Published: December 2022

Despite the recent explosion of machine learning applied to medical data, very few studies have examined algorithmic bias in any meaningful manner, comparing across algorithms, databases, and assessment metrics. In this study, we compared the biases in sex, age, and race of 56 algorithms on over 130,000 electrocardiograms (ECGs) using several metrics and propose a machine learning model design to reduce bias. Participants of the 2021 PhysioNet Challenge designed and implemented working, open-source algorithms to identify clinical diagnosis from 2- lead ECG recordings. We grouped the data from the training, validation, and test datasets by sex (male vs female), age (binned by decade), and race (Asian, Black, White, and Other) whenever possible. We computed recording-wise accuracy, area under the receiver operating characteristic curve (AUROC), area under the precision recall curve (AUPRC), F-measure, and the Challenge Score for each of the 56 algorithms. The Mann-Whitney U and the Kruskal-Wallis tests assessed the performance differences of algorithms across these demographic groups. Group trends revealed similar values for the AUROC, AUPRC, and F-measure for both male and female groups across the training, validation, and test sets. However, recording-wise accuracies were 20% higher (p < 0.01) and the Challenge Score 12% lower (p = 0.02) for female subjects on the test set. AUPRC, F-measure, and the Challenge Score increased with age, while recording-wise accuracy and AUROC decreased with age. The results were similar for the training and test sets, but only recording-wise accuracy (12% decrease per decade, p < 0.01), Challenge Score (1% increase per decade, p < 0.01), and AUROC (1% decrease per decade, p < 0.01) were statistically different on the test set. We observed similar AUROC, AUPRC, Challenge Score, and F-measure values across the different race categories. But, recording-wise accuracies were significantly lower for Black subjects and higher for Asian subjects on the training (31% difference, p < 0.01) and test (39% difference, p < 0.01) sets. A top performing model was then retrained using an additional constraint which simultaneously minimized differences in performance across sex, race and age. This resulted in a modest reduction in performance, with a significant reduction in bias. This work provides a demonstration that biases manifest as a function of model architecture, population, cost function and optimization metric, all of which should be closely examined in any model.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11486543	PMC
http://dx.doi.org/10.1016/j.jelectrocard.2022.07.007	DOI Listing

Publication Analysis

Top Keywords

machine learning

training validation

validation test

male female

auprc f-measure

algorithms

age sex

sex race

race bias

bias automated

Similar Publications

Predictive modeling of diazinon residual concentration in soils contaminated with potentially toxic elements: a comparative study of machine learning approaches.

Biodegradation

December 2024

Department of Civil engineering, Islamic Azad university, Mashhad Branch, Iran.

Marzieh Mohammadi Aria Safar Vafadar Yousef Sharafi Abbas Ali Ghezelsofloo

The widespread use of pesticides, including diazinon, poses an increased risk of environmental pollution and detrimental effects on biodiversity, food security, and water resources. In this study, we investigated the impact of Potentially Toxic Elements (PTE) including Zn, Cd, V, and Mn on the degradation of diazinon in three different soils. We investigated the capability and performance of four machine learning models to predict residual pesticide concentration, including adaptive neuro-fuzzy inference system (ANFIS), support vector regression (SVR), radial basis function (RBF), and multi-layer perceptron (MLP).

View Article and Find Full Text PDF

Similar Publications

Machine Learning Boosted Entropy-Engineered Synthesis of CuCo Nanometric Solid Solution Alloys for Near-100% Nitrate-to-Ammonia Selectivity.

ACS Appl Mater Interfaces

December 2024

Key Laboratory of Synthetic and Biological Colloids, Ministry of Education, School of Chemical and Material Engineering, Jiangnan University, 214122 Jiangsu, China.

Yao Hu Bo Hu Haihui Lan Jiaxuan Gong Renjing Hu

Nanometric solid solution alloys are utilized in a broad range of fields, including catalysis, energy storage, medical application, and sensor technology. Unfortunately, the synthesis of these alloys becomes increasingly challenging as the disparity between the metal elements grows, due to differences in atomic sizes, melting points, and chemical affinities. This study utilized a data-driven approach incorporating sample balancing enhancement techniques and multilayer perceptron (MLP) algorithms to improve the model's ability to handle imbalanced data, significantly boosting the efficiency of experimental parameter optimization.

View Article and Find Full Text PDF

Similar Publications

A novel generative multi-task representation learning approach for predicting postoperative complications in cardiac surgery patients.

J Am Med Inform Assoc

December 2024

AI for Health Institute, Washington University in St Louis, St Louis, MO 63130, United States.

Junbo Shen Bing Xue Thomas Kannampallil Chenyang Lu Joanna Abraham

Objective: Early detection of surgical complications allows for timely therapy and proactive risk mitigation. Machine learning (ML) can be leveraged to identify and predict patient risks for postoperative complications. We developed and validated the effectiveness of predicting postoperative complications using a novel surgical Variational Autoencoder (surgVAE) that uncovers intrinsic patterns via cross-task and cross-cohort presentation learning.

View Article and Find Full Text PDF

Similar Publications

De-biasing the bias: methods for improving disparity assessments with noisy group measurements.

Biometrics

October 2024

RAND Corporation, Pittsburgh, PA 15213, United States.

Solvejg Wastvedt Joshua Snoke Denis Agniel Julie Lai Marc N Elliott

Health care decisions are increasingly informed by clinical decision support algorithms, but these algorithms may perpetuate or increase racial and ethnic disparities in access to and quality of health care. Further complicating the problem, clinical data often have missing or poor quality racial and ethnic information, which can lead to misleading assessments of algorithmic bias. We present novel statistical methods that allow for the use of probabilities of racial/ethnic group membership in assessments of algorithm performance and quantify the statistical bias that results from error in these imputed group probabilities.

View Article and Find Full Text PDF

Similar Publications

25th National and 11th International Annual Congress on Research and Technology of Iranian Medical Sciences Students, Urmia, Iran, 5-7 September, 2024.

Iran Biomed J

December 2024

Student Research Committee , Department of Nursing, Khalkhal University of Medical Sciences, Khalkhal, Iran.

Elmira Firouzi Vahideh Aghamohammadi Amirhosein Jalili

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!