Generating high-fidelity privacy-conscious synthetic patient data for causal effect estimation with multiple treatments.

Jingpu Shi Dong Wang Gino Tesei Beau Norgeot

Front Artif Intell

Anthem AI, Palo Alto, CA, United States.

Published: September 2022

In the past decade, there has been exponentially growing interest in the use of observational data collected as a part of routine healthcare practice to determine the effect of a treatment with causal inference models. Validation of these models, however, has been a challenge because the ground truth is unknown: only one treatment-outcome pair for each person can be observed. There have been multiple efforts to fill this void using synthetic data where the ground truth can be generated. However, to date, these datasets have been severely limited in their utility either by being modeled after small non-representative patient populations, being dissimilar to real target populations, or only providing known effects for two cohorts (treated vs. control). In this work, we produced a large-scale and realistic synthetic dataset that provides ground truth effects for over 10 hypertension treatments on blood pressure outcomes. The synthetic dataset was created by modeling a nationwide cohort of more than 580, 000 hypertension patient data including each person's multi-year history of diagnoses, medications, and laboratory values. We designed a data generation process by combining an adapted ADS-GAN model for fictitious patient information generation and a neural network for treatment outcome generation. Wasserstein distance of 0.35 demonstrates that our synthetic data follows a nearly identical joint distribution to the patient cohort used to generate the data. Patient privacy was a primary concern for this study; the ϵ-identifiability metric, which estimates the probability of actual patients being identified, is 0.008%, ensuring that our synthetic data cannot be used to identify any actual patients. To demonstrate its usage, we tested the bias in causal effect estimation of four well-established models using this dataset. The approach we used can be readily extended to other types of diseases in the clinical domain, and to datasets in other domains as well.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9515575	PMC
http://dx.doi.org/10.3389/frai.2022.918813	DOI Listing

Publication Analysis

Top Keywords

ground truth

synthetic data

data

patient data

causal estimation

synthetic dataset

actual patients

synthetic

patient

generating high-fidelity

Similar Publications

Estimating hip impact velocity and acceleration from video-captured falls using a pose estimation algorithm.

Sci Rep

January 2025

Department of Exercise Science, Syracuse University, 150 Crouse Dr, Syracuse, NY, 13244, USA.

Reese Michaels Tiago V Barreira Stephen N Robinovitch Jacob J Sosnoff Yaejin Moon

Analyzing video footage of falls in older adults has emerged as an alternative to traditional lab studies. However, this approach is limited by the labor-intensive process of manually labeling body parts. To address this limitation, we aimed to validate the use of the AI-based pose estimation algorithm (OpenPose) in assessing the hip impact velocity and acceleration of video-captured falls.

View Article and Find Full Text PDF

Similar Publications

Performance evaluation of a machine learning-based methodology using dynamical features to detect nonwear intervals in actigraphy data in a free-living setting.

Sleep Health

January 2025

Department of Human and Development and Family Studies, Pennsylvania State University, University Park, Pennsylvania, USA.

Jyotirmoy Nirupam Das Linying Ji Yuqi Shen Soundar Kumara Orfeu M Buxton

Goal And Aims: One challenge using wearable sensors is nonwear time. Without a nonwear (e.g.

View Article and Find Full Text PDF

Similar Publications

An accurately supervised motion-aware deep network for non-contact pain assessment of trigeminal neuralgia mouse model.

J Oral Facial Pain Headache

March 2024

Department of Oral and Maxillofacial Surgery, Peking University School of Stomatology, 100081 Beijing, China.

Zhiheng Feng Mingcai Chen Jue Zhang Xin Peng

Pain assessment in trigeminal neuralgia (TN) mouse models is essential for exploring its pathophysiology and developing effective analgesics. However, pain assessment methods for TN mouse models have not been widely studied, resulting in a critical gap in our understanding of TN. With the rapid advancement of deep learning, numerous pain assessment methods based on deep learning have emerged.

View Article and Find Full Text PDF

Similar Publications

Comparable performance between automatic and manual laryngeal and hypopharyngeal GTV delineations validated with pathology.

Int J Radiat Oncol Biol Phys

January 2025

Department of Radiotherapy, University Medical Center Utrecht, Utrecht, The Netherlands.

Koen M Kuijer Hilde J G Smits Patricia A H Doornaert Kenan Niu Mark H F Savenije

Purpose: Deep learning is a promising approach to increase reproducibility and time-efficiency of GTV delineation in head and neck cancer, but model evaluation primarily relies on manual GTV delineations as reference annotation, which are subjective and tend to overestimate tumor volume. This study aimed to validate a deep learning model for laryngeal and hypopharyngeal GTV segmentation with pathology and to compare its performance with clinicians' manual delineations.

Materials And Methods: A retrospective dataset of 193 laryngeal and hypopharyngeal cancer patients was used to train a deep learning model with clinical GTV delineations as reference.

View Article and Find Full Text PDF

Similar Publications

Contour uncertainty assessment for MD-omitted daily adaptive online head and neck radiotherapy.

Radiother Oncol

January 2025

Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX, USA; Medical Artificial Intelligence and Automation Laboratory, Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX, USA. Electronic address:

Chien-Yi Liao Austen Matthew Maniscalco Hengrui Zhao Ti Bai Byongsu Choi

Background And Purpose: Daily online adaptive radiotherapy (DART) increases treatment accuracy by crafting daily customized plans that adjust to the patient's daily setup and anatomy. The routine application of DART is limited by its resource-intensive processes. This study proposes a novel DART strategy for head and neck squamous cell carcinoma (HNSCC), automizing the process by propagating physician-edited treatment contours for each fraction.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!