Flexible variable selection in the presence of missing data.

Int J Biostat

Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, USA.

Published: November 2024

In many applications, it is of interest to identify a parsimonious set of features, or panel, from multiple candidates that achieves a desired level of performance in predicting a response. This task is often complicated in practice by missing data arising from the sampling design or other random mechanisms. Most recent work on variable selection in missing data contexts relies in some part on a finite-dimensional statistical model, e.g., a generalized or penalized linear model. In cases where this model is misspecified, the selected variables may not all be truly scientifically relevant and can result in panels with suboptimal classification performance. To address this limitation, we propose a nonparametric variable selection algorithm combined with multiple imputation to develop flexible panels in the presence of missing-at-random data. We outline strategies based on the proposed algorithm that achieve control of commonly used error rates. Through simulations, we show that our proposal has good operating characteristics and results in panels with higher classification and variable selection performance compared to several existing penalized regression approaches in cases where a generalized linear model is misspecified. Finally, we use the proposed method to develop biomarker panels for separating pancreatic cysts with differing malignancy potential in a setting where complicated missingness in the biomarkers arose due to limited specimen volumes.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11323294PMC
http://dx.doi.org/10.1515/ijb-2023-0059DOI Listing

Publication Analysis

Top Keywords

variable selection
16
missing data
12
linear model
8
model misspecified
8
flexible variable
4
selection
4
selection presence
4
presence missing
4
data
4
data applications
4

Similar Publications

The association between multilingual experience factors and cognitive functioning in older adults: A Lifelines study.

J Gerontol B Psychol Sci Soc Sci

January 2025

Linguistics and English as a Second Language, Faculty of Arts, University of Groningen, Groningen, the Netherlands.

Objectives: The complex life experience of speaking two or more languages has been suggested to preserve cognition in older adulthood. This study aimed to investigate this further by examining the relationship between multilingual experience variables and cognitive functioning in a large cohort of older adults in the diversely multilingual north of the Netherlands.

Method: 11,332 older individuals participating in the Lifelines Cohort Study completed a language experience questionnaire.

View Article and Find Full Text PDF

Presurgical anxiety and acute postsurgical pain predict worse chronic pain profiles after total knee/hip arthroplasty.

Arch Orthop Trauma Surg

January 2025

Life and Health Sciences Research Institute (ICVS), School of Medicine, University of Minho, Campus de Gualtar, Braga, 4710-057, Portugal.

Introduction: Total joint arthroplasties generally achieve good outcomes, but chronic pain and disability are a significant burden after these interventions. Acknowledging relevant risk factors can inform preventive strategies. This study aimed to identify chronic pain profiles 6 months after arthroplasty using the ICD-11 (International Classification of Diseases) classification and to find pre and postsurgical predictors of these profiles.

View Article and Find Full Text PDF

Introduction: There is a lack of clinical evidence supporting the decision-making process between high tibial osteotomy (HTO) and unicomparmental knee arthroplasty (UKA) in gray zone indication, such as moderate medial osteoarthritis with moderate varus alignment. This study compared the outcomes between HTO and UKA in such cases and assessed the risk factor for not maintaining clinical improvements.

Materials And Methods: We retrospectively reviewed 65 opening-wedge HTOs and 55 UKAs with moderate medial osteoarthritis (Kellgren-Lawrence grade ≥ 3 and Ahlback grade < 3) and moderate varus alignment (5°< Hip-Knee-Ankle angle < 10°) over 3 years follow-up.

View Article and Find Full Text PDF

Indian Himalayan Region (IHR) supports a plethora of biodiversity with a unique assemblage of many charismatic and endemic species. We assessed the genetic diversity, demographic history, and habitat suitability of blue sheep (Pseudois nayaur) in the IHR through the analysis of the mitochondrial DNA (mtDNA) control region (CR) and Cytochrome b gene, and 14 ecological predictor variables. We observed high genetic divergence and designated them into two genetic lineage groups, i.

View Article and Find Full Text PDF

Cohort-based nomogram for forensic prediction of SCD: a single-center pilot study.

Forensic Sci Med Pathol

January 2025

Department of Forensic Pathology, School of Forensic Medicine, China Medical University, Shenyang, 110122, P. R. China.

Forensic diagnosis of sudden cardiac death (SCD) is an extremely important part of routine forensic practice. The present study aimed to develop and validate nomograms for predicting the probability of SCD with special regards to ischemic heart disease-induced SCD (IHD-induced SCD) based on multiple autopsy variables. A total of 3322 cases, were enrolled and randomly assigned into a training cohort (n = 2325) and a validation cohort (n = 997), respectively.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!