Missing values (NA) often occur in cancer research, which may be due to reasons such as data protection, data loss, or missing follow-up data. Such incomplete patient information can have an impact on prediction models and other data analyses. Imputation methods are a tool for dealing with NA. Cancer data is often presented in an ordered categorical form, such as tumour grading and staging, which requires special methods. This work compares mode imputation, k nearest neighbour (knn) imputation, and, in the context of Multiple Imputation by Chained Equations (MICE), logistic regression model with proportional odds (mice_polr) and random forest (mice_rf) on a real-world prostate cancer dataset provided by the Cancer Registry of Rhineland-Palatinate in Germany. Our dataset contains relevant information for the risk classification of patients and the time between date of diagnosis and date of death. For the imputation comparison, we use Rubin's (1974) Missing Completely At Random (MCAR) mechanism to remove 10%, 20%, 30%, and 50% observations. The results are evaluated and ranked based on the accuracy per patient. Mice_rf performs significantly best for each percentage of NA, followed by knn, and mice_polr performs significantly worst. Furthermore, our findings indicate that the accuracy of imputation methods increases with a lower number of categories, a relatively even proportion of patients in the categories, or a majority of patients in a particular category.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.3233/SHTI240780 | DOI Listing |
Environ Epidemiol
February 2025
Department of Public Health Sciences, University of Rochester Medical Center, Rochester, New York.
Background: Sex steroid hormones are critical for maintaining pregnancy and optimal fetal development. Air pollutants are potential endocrine disruptors that may disturb sex steroidogenesis during pregnancy, potentially leading to adverse health outcomes.
Methods: In the Environmental influences on Child Health Outcomes Understanding Pregnancy Signals and Infant Development pregnancy cohort (Rochester, NY), sex steroid concentrations were collected at study visits in early-, mid-, and late-pregnancy in 299 participants.
Front Public Health
January 2025
Centre for Health Economics Research and Modelling Infectious Diseases, Vaccine and Infectious Disease Institute, University of Antwerp, Antwerp, Belgium.
Introduction: In relatively wealthy countries, substantial between-country variability in COVID-19 vaccination coverage occurred. We aimed to identify influential national-level determinants of COVID-19 vaccine uptake at different COVID-19 pandemic stages in such countries.
Methods: We considered over 50 macro-level demographic, healthcare resource, disease burden, political, socio-economic, labor, cultural, life-style indicators as explanatory factors and coverage with at least one dose by June 2021, completed initial vaccination protocols by December 2021, and booster doses by June 2022 as outcomes.
Alzheimers Dement (N Y)
January 2025
Indiana Alzheimer Disease Research Center and Center for Neuroimaging, Department of Radiology and Imaging Sciences Indiana University School of Medicine Indianapolis Indiana USA.
Introduction: The exponential growth of genomic datasets necessitates advanced analytical tools to effectively identify genetic loci from large-scale high throughput sequencing data. This study presents Deep-Block, a multi-stage deep learning framework that incorporates biological knowledge into its AI architecture to identify genetic regions as significantly associated with Alzheimer's disease (AD). The framework employs a three-stage approach: (1) genome segmentation based on linkage disequilibrium (LD) patterns, (2) selection of relevant LD blocks using sparse attention mechanisms, and (3) application of TabNet and Random Forest algorithms to quantify single nucleotide polymorphism (SNP) feature importance, thereby identifying genetic factors contributing to AD risk.
View Article and Find Full Text PDFPediatr Obes
January 2025
Department of Women's and Children's Health, Dunedin School of Medicine, University of Otago, Dunedin, New Zealand.
Objective: To determine whether BMI differences observed at 5 years of age, from early intervention in infancy, remained apparent at 11 years.
Methods: Participants (n = 734) from the original randomized controlled trial (n = 802) underwent measures of body mass index (BMI), body composition (DXA), sleep and physical activity (24-h accelerometry, questionnaire), diet (repeated 24-h recalls), screen time (daily diaries), wellbeing (CHU-9D, WHO-5), and family functioning (McMaster FAD) around their 11th birthday. Following multiple imputation, regression models explored the effects of two interventions ('Sleep' vs.
BMC Public Health
January 2025
Institute for Occupational and Maritime Medicine (ZfAM), University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany.
Background: Coronary heart disease (CHD) is the leading cause of death among adults in Germany. There is evidence that occupational exposure to particulate matter, noise, psychosocial stressors, shift work and high physical workload are associated with CHD. The aim of this study is to identify occupations that are associated with CHD and to elaborate on occupational exposures associated with CHD by using the job exposure matrix (JEM) BAuA-JEM ETB 2018 in a German study population.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!