In the past decade, there has been exponentially growing interest in the use of observational data collected as a part of routine healthcare practice to determine the effect of a treatment with causal inference models. Validation of these models, however, has been a challenge because the ground truth is unknown: only one treatment-outcome pair for each person can be observed. There have been multiple efforts to fill this void using synthetic data where the ground truth can be generated. However, to date, these datasets have been severely limited in their utility either by being modeled after small non-representative patient populations, being dissimilar to real target populations, or only providing known effects for two cohorts (treated vs. control). In this work, we produced a large-scale and realistic synthetic dataset that provides ground truth effects for over 10 hypertension treatments on blood pressure outcomes. The synthetic dataset was created by modeling a nationwide cohort of more than 580, 000 hypertension patient data including each person's multi-year history of diagnoses, medications, and laboratory values. We designed a data generation process by combining an adapted ADS-GAN model for fictitious patient information generation and a neural network for treatment outcome generation. Wasserstein distance of 0.35 demonstrates that our synthetic data follows a nearly identical joint distribution to the patient cohort used to generate the data. Patient privacy was a primary concern for this study; the ϵ-identifiability metric, which estimates the probability of actual patients being identified, is 0.008%, ensuring that our synthetic data cannot be used to identify any actual patients. To demonstrate its usage, we tested the bias in causal effect estimation of four well-established models using this dataset. The approach we used can be readily extended to other types of diseases in the clinical domain, and to datasets in other domains as well.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9515575PMC
http://dx.doi.org/10.3389/frai.2022.918813DOI Listing

Publication Analysis

Top Keywords

ground truth
12
synthetic data
12
data
8
patient data
8
causal estimation
8
synthetic dataset
8
actual patients
8
synthetic
6
patient
6
generating high-fidelity
4

Similar Publications

Analyzing video footage of falls in older adults has emerged as an alternative to traditional lab studies. However, this approach is limited by the labor-intensive process of manually labeling body parts. To address this limitation, we aimed to validate the use of the AI-based pose estimation algorithm (OpenPose) in assessing the hip impact velocity and acceleration of video-captured falls.

View Article and Find Full Text PDF

Goal And Aims: One challenge using wearable sensors is nonwear time. Without a nonwear (e.g.

View Article and Find Full Text PDF

Pain assessment in trigeminal neuralgia (TN) mouse models is essential for exploring its pathophysiology and developing effective analgesics. However, pain assessment methods for TN mouse models have not been widely studied, resulting in a critical gap in our understanding of TN. With the rapid advancement of deep learning, numerous pain assessment methods based on deep learning have emerged.

View Article and Find Full Text PDF

Purpose: Deep learning is a promising approach to increase reproducibility and time-efficiency of GTV delineation in head and neck cancer, but model evaluation primarily relies on manual GTV delineations as reference annotation, which are subjective and tend to overestimate tumor volume. This study aimed to validate a deep learning model for laryngeal and hypopharyngeal GTV segmentation with pathology and to compare its performance with clinicians' manual delineations.

Materials And Methods: A retrospective dataset of 193 laryngeal and hypopharyngeal cancer patients was used to train a deep learning model with clinical GTV delineations as reference.

View Article and Find Full Text PDF

Contour uncertainty assessment for MD-omitted daily adaptive online head and neck radiotherapy.

Radiother Oncol

January 2025

Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX, USA; Medical Artificial Intelligence and Automation Laboratory, Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX, USA. Electronic address:

Background And Purpose: Daily online adaptive radiotherapy (DART) increases treatment accuracy by crafting daily customized plans that adjust to the patient's daily setup and anatomy. The routine application of DART is limited by its resource-intensive processes. This study proposes a novel DART strategy for head and neck squamous cell carcinoma (HNSCC), automizing the process by propagating physician-edited treatment contours for each fraction.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!