AI Article Synopsis

  • This study aims to create a predictive model for cardiovascular disease (CVD) in breast cancer survivors using diverse data from the All of Us Research Program, focusing on fairness across different demographics.
  • The researchers developed a universal data pipeline to integrate various data types, such as electronic health records, patient surveys, and genomic information, and applied models like Adaptive Lasso and Random Forest to predict CVD outcomes over a 10-year span.
  • Results show that the Adaptive Lasso model performed well overall, while the Random Forest model was particularly strong for predicting certain events; factors like age and prior heart issues were key predictors, highlighting the importance of social determinants of health in understanding patient outcomes.

Article Abstract

Objective: This study leverages the rich diversity of the All of Us Research Program (All of Us)'s dataset to devise a predictive model for cardiovascular disease (CVD) in breast cancer (BC) survivors. Central to this endeavor is the creation of a robust data integration pipeline that synthesizes electronic health records (EHRs), patient surveys, and genomic data, while upholding fairness across demographic variables.

Materials And Methods: We have developed a universal data wrangling pipeline to process and merge heterogeneous data sources of the All of Us dataset, address missingness and variance in data, and align disparate data modalities into a coherent framework for analysis. Utilizing a composite feature set including EHR, lifestyle, and social determinants of health (SDoH) data, we then employed Adaptive Lasso and Random Forest regression models to predict 6 CVD outcomes. The models were evaluated using the c-index and time-dependent Area Under the Receiver Operating Characteristic Curve over a 10-year period.

Results: The Adaptive Lasso model showed consistent performance across most CVD outcomes, while the Random Forest model excelled particularly in predicting outcomes like transient ischemic attack when incorporating the full multi-model feature set. Feature importance analysis revealed age and previous coronary events as dominant predictors across CVD outcomes, with SDoH clustering labels highlighting the nuanced impact of social factors.

Discussion: The development of both Cox-based predictive model and Random Forest Regression model represents the extensive application of the All of Us, in integrating EHR and patient surveys to enhance precision medicine. And the inclusion of SDoH clustering labels revealed the significant impact of sociobehavioral factors on patient outcomes, emphasizing the importance of comprehensive health determinants in predictive models. Despite these advancements, limitations include the exclusion of genetic data, broad categorization of CVD conditions, and the need for fairness analyses to ensure equitable model performance across diverse populations. Future work should refine clinical and social variable measurements, incorporate advanced imputation techniques, and explore additional predictive algorithms to enhance model precision and fairness.

Conclusion: This study demonstrates the liability of the All of Us's diverse dataset in developing a multi-modality predictive model for CVD in BC survivors risk stratification in oncological survivorship. The data integration pipeline and subsequent predictive models establish a methodological foundation for future research into personalized healthcare.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11631116PMC
http://dx.doi.org/10.1093/jamia/ocae199DOI Listing

Publication Analysis

Top Keywords

predictive model
12
random forest
12
cvd outcomes
12
data
9
breast cancer
8
model
8
data integration
8
integration pipeline
8
patient surveys
8
feature set
8

Similar Publications

Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models.

Genet Med

December 2024

Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN; Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN; Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN. Electronic address:

Purpose: The value of genetic information for improving the performance of clinical risk prediction models has yielded variable conclusions. Many methodological decisions have the potential to contribute to differential results. We performed multiple modeling experiments integrating clinical and demographic data from electronic health records (EHR) with genetic data to understand which decisions may affect performance.

View Article and Find Full Text PDF

Study on jet dynamic impact performance under the influence of standoff.

Sci Rep

December 2024

School of Mechanical and Electrical Engineering, North University of China, Taiyuan, 030051, Shanxi, China.

Due to the sensitivity of the shaped charge jet to standoff and the complexity of its impact under lateral disturbances, this study aims to investigate the dynamic impact evolution of the jet influenced by standoff and lateral disturbances. A finite element model for the dynamic impact of shaped charge jets was established. Dynamic impact experiments were designed and conducted to validate the effectiveness of the numerical simulations.

View Article and Find Full Text PDF

Distributed coordinated motion control of multiple UAVs oriented to optimization of air-ground relay network.

Sci Rep

December 2024

School of Automation Science and Electrical Engineering, Beihang University, Beijing, 100191, China.

A novel adaptive model-based motion control method for multi-UAV communication relay is proposed, which aims at improving the networks connectivity and the communications performance among a fleet of ground unmanned vehicles. The method addresses the challenge of relay UAVs motion control through joint consideration with unknown multi-user mobility, environmental effects on channel characteristics, unavailable angle-of-arrival data of received signals, and coordination among multiple UAVs. The method consists of two parts: (1) Network connectivity is constructed and communication performance index is defined using the minimum spanning tree in graph theory, which considers both the communication link between ground node and UAV, and the communication link between ground nodes.

View Article and Find Full Text PDF

Collapsible loess soils, known for their significant volume reduction upon the wetting, pose critical challenges in the geotechnical engineering. The estimation of the wetting-induced settlement is crucial for the foundation design and the determination of the negative skin friction on the pile. In this paper, a new method is proposed to estimate the wetting induced collapse from the wetting soil-water characteristic curve (SWCC) and the index properties of the loess soils.

View Article and Find Full Text PDF

Moving beyond word frequency based on tally counting: AI-generated familiarity estimates of words and phrases are an interesting additional index of language knowledge.

Behav Res Methods

December 2024

ETSI de Telecomunicación, Universidad Politécnica de Madrid, Avenida Complutense, 30, 28040, Madrid, Spain.

This study investigates the potential of large language models (LLMs) to estimate the familiarity of words and multi-word expressions (MWEs). We validated LLM estimates for isolated words using existing human familiarity ratings and found strong correlations. LLM familiarity estimates performed even better in predicting lexical decision and naming performance in megastudies than the best available word frequency measures.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!