Data heterogeneity in federated learning with Electronic Health Records: Case studies of risk prediction for acute kidney injury and sepsis diseases in critical care.

PLOS Digit Health

Division of Health Informatics, Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, United States of America.

Published: March 2023

With the wider availability of healthcare data such as Electronic Health Records (EHR), more and more data-driven based approaches have been proposed to improve the quality-of-care delivery. Predictive modeling, which aims at building computational models for predicting clinical risk, is a popular research topic in healthcare analytics. However, concerns about privacy of healthcare data may hinder the development of effective predictive models that are generalizable because this often requires rich diverse data from multiple clinical institutions. Recently, federated learning (FL) has demonstrated promise in addressing this concern. However, data heterogeneity from different local participating sites may affect prediction performance of federated models. Due to acute kidney injury (AKI) and sepsis' high prevalence among patients admitted to intensive care units (ICU), the early prediction of these conditions based on AI is an important topic in critical care medicine. In this study, we take AKI and sepsis onset risk prediction in ICU as two examples to explore the impact of data heterogeneity in the FL framework as well as compare performances across frameworks. We built predictive models based on local, pooled, and FL frameworks using EHR data across multiple hospitals. The local framework only used data from each site itself. The pooled framework combined data from all sites. In the FL framework, each local site did not have access to other sites' data. A model was updated locally, and its parameters were shared to a central aggregator, which was used to update the federated model's parameters and then subsequently, shared with each site. We found models built within a FL framework outperformed local counterparts. Then, we analyzed variable importance discrepancies across sites and frameworks. Finally, we explored potential sources of the heterogeneity within the EHR data. The different distributions of demographic profiles, medication use, and site information contributed to data heterogeneity.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10016691PMC
http://dx.doi.org/10.1371/journal.pdig.0000117DOI Listing

Publication Analysis

Top Keywords

data heterogeneity
16
data
12
federated learning
8
electronic health
8
health records
8
risk prediction
8
acute kidney
8
kidney injury
8
critical care
8
healthcare data
8

Similar Publications

Risk and protective factors of disease flare during pregnancy in systemic lupus erythematosus: a systematic review and meta-analysis.

Clin Rheumatol

January 2025

Department of Obstetrics and Gynecology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Peking Union Medical College, No. 1 Shuaifuyuan, Beijing, 100730, China.

To synthesize available evidence on predictive factors associated with systemic lupus erythematosus (SLE) flares during pregnancy, we systematically searched MEDLINE, Embase, and the Cochrane Library through January 2024 for observational studies on risk and protective factors of SLE flares during pregnancy. Odds ratios (OR) and mean differences (MD), as well as their 95% confidence intervals (CI) were used to quantify effect sizes. We employed fixed-effect or random-effect models based on heterogeneity assessments (I statistics).

View Article and Find Full Text PDF

Unveiling the role of PANoptosis-related genes in breast cancer: an integrated study by multi-omics analysis and machine learning algorithms.

Breast Cancer Res Treat

January 2025

Department of Breast Surgery, Thyroid Surgery, Huangshi Central Hospital, Affiliated Hospital of Hubei Polytechnic University, No.141, Tianjin Road, Huangshi, 435000, Hubei, China.

Background: The heterogeneity of breast cancer (BC) necessitates the identification of novel subtypes and prognostic models to enhance patient stratification and treatment strategies. This study aims to identify novel BC subtypes based on PANoptosis-related genes (PRGs) and construct a robust prognostic model to guide individualized treatment strategies.

Methods: The transcriptome data along with clinical data of BC patients were sourced from the TCGA and GEO databases.

View Article and Find Full Text PDF

Owing to China's massive area and vastly differing regional variations in the types and efficiency of energy, the spatiotemporal distributions of regional carbon emissions (CE) vary widely. Regional CE study is becoming more crucial for determining the future course of sustainable development worldwide. In this work, two types of nighttime light data were integrated to expand the study's temporal coverage.

View Article and Find Full Text PDF

Predictive model performance may deteriorate when applied to data sources that were not used for training, thus, external validation is a key step in successful model deployment. As access to patient-level external data sources is typically limited, we recently proposed a method that estimates external model performance using only external summary statistics. Here, we benchmark the proposed method on multiple tasks using five large heterogeneous US data sources, where each, in turn, plays the role of an internal source and the remaining-external.

View Article and Find Full Text PDF

Understanding whether risk preference represents a stable, coherent trait is central to efforts aimed at explaining, predicting and preventing risk-related behaviours. We help characterize the nature of the construct by adopting a systematic review and individual participant data meta-analytic approach to summarize the temporal stability of 358 risk preference measures (33 panels, 57 samples, 579,114 respondents). Our findings reveal noteworthy heterogeneity across and within measure categories (propensity, frequency and behaviour), domains (for example, investment, occupational and alcohol consumption) and sample characteristics (for example, age).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!