Comparison of predicting cardiovascular disease hospitalization using individual, ZIP code-derived, and machine learning model-predicted educational attainment in New York City.

Kullaya Takkavatakarn Yang Dai Huei Hsun Wen Justin Kauffman Alexander Charney Steven G Coca Girish N Nadkarni Lili Chan

PLoS One

Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States of America.

Published: February 2024

Researchers explored using machine learning to predict individual educational attainment, moving beyond just ZIP code data in health studies.
They analyzed data from over 20,000 participants in New York City, testing various predictive models for cardiovascular hospitalization based on educational attainment data.
The findings showed that machine learning-predicted education levels had a better correlation with survey data compared to ZIP code-based education, leading to improved predictions for cardiovascular hospitalization outcomes.

Background: Area-level social determinants of health (SDOH) based on patients' ZIP codes or census tracts have been commonly used in research instead of individual SDOHs. To our knowledge, whether machine learning (ML) could be used to derive individual SDOH measures, specifically individual educational attainment, is unknown.

Methods: This is a retrospective study using data from the Mount Sinai BioMe Biobank. We included participants that completed a validated questionnaire on educational attainment and had home addresses in New York City. ZIP code-level education was derived from the American Community Survey matched for the participant's gender and race/ethnicity. We tested several algorithms to predict individual educational attainment from routinely collected clinical and demographic data. To evaluate how using different measures of educational attainment will impact model performance, we developed three distinct models for predicting cardiovascular (CVD) hospitalization. Educational attainment was imputed into models as either survey-derived, ZIP code-derived, or ML-predicted educational attainment.

Results: A total of 20,805 participants met inclusion criteria. Concordance between survey and ZIP code-derived education was 47%, while the concordance between survey and ML model-predicted education was 67%. A total of 13,715 patients from the cohort were included into our CVD hospitalization prediction models, of which 1,538 (11.2%) had a history of CVD hospitalization. The AUROC of the model predicting CVD hospitalization using survey-derived education was significantly higher than the model using ZIP code-level education (0.77 versus 0.72; p < 0.001) and the model using ML model-predicted education (0.77 versus 0.75; p < 0.001). The AUROC for the model using ML model-predicted education was also significantly higher than that using ZIP code-level education (p = 0.003).

Conclusion: The concordance of survey and ZIP code-level educational attainment in NYC was low. As expected, the model utilizing survey-derived education achieved the highest performance. The model incorporating our ML model-predicted education outperformed the model relying on ZIP code-derived education. Implementing ML techniques can improve the accuracy of SDOH data and consequently increase the predictive performance of outcome models.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10852236	PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0297919	PLOS

Publication Analysis

Top Keywords

educational attainment

zip code-derived

zip code-level

cvd hospitalization

model-predicted education

code-level education

concordance survey

education

zip

predicting cardiovascular

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!