Data heterogeneity in federated learning with Electronic Health Records: Case studies of risk prediction for acute kidney injury and sepsis diseases in critical care.

Suraj Rajendran Zhenxing Xu Weishen Pan Arnab Ghosh Fei Wang

PLOS Digit Health

Division of Health Informatics, Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, United States of America.

Published: March 2023

With the wider availability of healthcare data such as Electronic Health Records (EHR), more and more data-driven based approaches have been proposed to improve the quality-of-care delivery. Predictive modeling, which aims at building computational models for predicting clinical risk, is a popular research topic in healthcare analytics. However, concerns about privacy of healthcare data may hinder the development of effective predictive models that are generalizable because this often requires rich diverse data from multiple clinical institutions. Recently, federated learning (FL) has demonstrated promise in addressing this concern. However, data heterogeneity from different local participating sites may affect prediction performance of federated models. Due to acute kidney injury (AKI) and sepsis' high prevalence among patients admitted to intensive care units (ICU), the early prediction of these conditions based on AI is an important topic in critical care medicine. In this study, we take AKI and sepsis onset risk prediction in ICU as two examples to explore the impact of data heterogeneity in the FL framework as well as compare performances across frameworks. We built predictive models based on local, pooled, and FL frameworks using EHR data across multiple hospitals. The local framework only used data from each site itself. The pooled framework combined data from all sites. In the FL framework, each local site did not have access to other sites' data. A model was updated locally, and its parameters were shared to a central aggregator, which was used to update the federated model's parameters and then subsequently, shared with each site. We found models built within a FL framework outperformed local counterparts. Then, we analyzed variable importance discrepancies across sites and frameworks. Finally, we explored potential sources of the heterogeneity within the EHR data. The different distributions of demographic profiles, medication use, and site information contributed to data heterogeneity.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10016691	PMC
http://dx.doi.org/10.1371/journal.pdig.0000117	DOI Listing

Publication Analysis

Top Keywords

data heterogeneity

data

federated learning

electronic health

health records

risk prediction

acute kidney

kidney injury

critical care

healthcare data

Similar Publications

Risk and protective factors of disease flare during pregnancy in systemic lupus erythematosus: a systematic review and meta-analysis.

Clin Rheumatol

January 2025

Department of Obstetrics and Gynecology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Peking Union Medical College, No. 1 Shuaifuyuan, Beijing, 100730, China.

Yudi Yang Yangzhong Zhou Xueyang Zhang Can Huang Lingshan Liu

To synthesize available evidence on predictive factors associated with systemic lupus erythematosus (SLE) flares during pregnancy, we systematically searched MEDLINE, Embase, and the Cochrane Library through January 2024 for observational studies on risk and protective factors of SLE flares during pregnancy. Odds ratios (OR) and mean differences (MD), as well as their 95% confidence intervals (CI) were used to quantify effect sizes. We employed fixed-effect or random-effect models based on heterogeneity assessments (I statistics).

View Article and Find Full Text PDF

Similar Publications

Unveiling the role of PANoptosis-related genes in breast cancer: an integrated study by multi-omics analysis and machine learning algorithms.

Breast Cancer Res Treat

January 2025

Department of Breast Surgery, Thyroid Surgery, Huangshi Central Hospital, Affiliated Hospital of Hubei Polytechnic University, No.141, Tianjin Road, Huangshi, 435000, Hubei, China.

Gang Liu Liang-Zhi Pan Jie Chen Jianying Ma

Background: The heterogeneity of breast cancer (BC) necessitates the identification of novel subtypes and prognostic models to enhance patient stratification and treatment strategies. This study aims to identify novel BC subtypes based on PANoptosis-related genes (PRGs) and construct a robust prognostic model to guide individualized treatment strategies.

Methods: The transcriptome data along with clinical data of BC patients were sourced from the TCGA and GEO databases.

View Article and Find Full Text PDF

Similar Publications

Spatiotemporal dynamics and driving factors of energy-related carbon emissions in the Yangtze River Delta region based on nighttime light data.

Sci Rep

January 2025

School of Surveying and Land Information Engineering, Henan Polytechnic University, Jiaozuo, 454003, China.

Huazhu Xue Qianqian Ma Xiaosan Ge

Owing to China's massive area and vastly differing regional variations in the types and efficiency of energy, the spatiotemporal distributions of regional carbon emissions (CE) vary widely. Regional CE study is becoming more crucial for determining the future course of sustainable development worldwide. In this work, two types of nighttime light data were integrated to expand the study's temporal coverage.

View Article and Find Full Text PDF

Similar Publications

Extensive benchmarking of a method that estimates external model performance from limited statistical characteristics.

NPJ Digit Med

January 2025

KI Research Institute, Kfar Malal, Israel.

Tal El-Hay Jenna M Reps Chen Yanover

Predictive model performance may deteriorate when applied to data sources that were not used for training, thus, external validation is a key step in successful model deployment. As access to patient-level external data sources is typically limited, we recently proposed a method that estimates external model performance using only external summary statistics. Here, we benchmark the proposed method on multiple tasks using five large heterogeneous US data sources, where each, in turn, plays the role of an internal source and the remaining-external.

View Article and Find Full Text PDF

Similar Publications

A systematic review and meta-analyses of the temporal stability and convergent validity of risk preference measures.

Nat Hum Behav

January 2025

Faculty of Psychology, University of Basel, Basel, Switzerland.

Alexandra Bagaïni Yunrui Liu Madlaina Kapoor Gayoung Son Paul-Christian Bürkner

Understanding whether risk preference represents a stable, coherent trait is central to efforts aimed at explaining, predicting and preventing risk-related behaviours. We help characterize the nature of the construct by adopting a systematic review and individual participant data meta-analytic approach to summarize the temporal stability of 358 risk preference measures (33 panels, 57 samples, 579,114 respondents). Our findings reveal noteworthy heterogeneity across and within measure categories (propensity, frequency and behaviour), domains (for example, investment, occupational and alcohol consumption) and sample characteristics (for example, age).

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!