The availability of large-scale biobanks linking genetic data, rich phenotypes, and biological measures is a powerful opportunity for scientific discovery. However, real-world collections frequently have extensive missingness. While missing data prediction is possible, performance is significantly impaired by block-wise missingness inherent to many biobanks. To address this, we developed Missingness Adapted Group-wise Informed Clustered (MAGIC)-LASSO which performs hierarchical clustering of variables based on missingness followed by sequential Group LASSO within clusters. Variables are pre-filtered for missingness and balance between training and target sets with final models built using stepwise inclusion of features ranked by completeness. This research has been conducted using the UK Biobank ( > 500 k) to predict unmeasured Alcohol Use Disorders Identification Test (AUDIT) scores. The phenotypic correlation between measured and predicted total score was 0.67 while genetic correlations between independent subjects was high >0.86. Phenotypic and genetic correlations in real data application, as well as simulations, demonstrate the method has significant accuracy and utility for increasing power for genetic loci discovery.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10399453 | PMC |
http://dx.doi.org/10.3389/fgene.2023.1162690 | DOI Listing |
Archaeol Anthropol Sci
October 2024
Department of Anthropology, McMaster University, Hamilton, Canada.
Unlabelled: Missing data is a prevalent problem in bioarchaeological research and imputation could provide a promising solution. This work simulated missingness on a control dataset (481 samples × 41 variables) in order to explore imputation methods for mixed data (qualitative and quantitative data). The tested methods included Random Forest (RF), PCA/MCA, factorial analysis for mixed data (FAMD), hotdeck, predictive mean matching (PMM), random samples from observed values (RSOV), and a multi-method (MM) approach for the three missingness mechanisms (MCAR, MAR, and MNAR) at levels of 5%, 10%, 20%, 30%, and 40% missingness.
View Article and Find Full Text PDFMed Decis Making
November 2024
Center for Applied Health Research on Aging (CAHRA), Institute for Public Health and Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.
Background: Context-specific measures with adequate external validity are needed to appropriately determine psychosocial effects related to screening for cognitive impairment.
Methods: Two-hundred adults aged ≥65 y recently completing routine, standardized cognitive screening as part of their Medicare annual wellness visit were administered an adapted version of the Psychological Consequences of Screening Questionnaire (PCQ), composed of negative (PCQ-Neg) and positive (PCQ-Pos) scales. Measure distribution, acceptability, internal consistency, factor structure, and external validity (construct, discriminative, criterion) were analyzed.
Proteomics
September 2024
Department of Molecular Biosciences, University of South Florida, Tampa, Florida, USA.
Targeted proteomics, which includes parallel reaction monitoring (PRM), is typically utilized for more precise detection and quantitation of key proteins and/or pathways derived from complex discovery proteomics datasets. Initial discovery-based analysis using data independent acquisition (DIA) can obtain deep proteome coverage with low data missingness while targeted PRM assays can provide additional benefits in further eliminating missing data and optimizing measurement precision. However, PRM method development from bioinformatic predictions can be tedious and time-consuming because of the DIA output complexity.
View Article and Find Full Text PDFJ Am Med Inform Assoc
December 2024
Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!