Identifying which variables do influence a response while controlling false positives pervades statistics and data science. In this paper, we consider a scenario in which we only have access to summary statistics, such as the values of marginal empirical correlations between each dependent variable of potential interest and the response. This situation may arise due to privacy concerns, e.g., to avoid the release of sensitive genetic information. We extend GhostKnockoffs He et al. [2022] and introduce variable selection methods based on penalized regression achieving false discovery rate (FDR) control. We report empirical results in extensive simulation studies, demonstrating enhanced performance over previous work. We also apply our methods to genome-wide association studies of Alzheimer's disease, and evidence a significant improvement in power.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10925382PMC

Publication Analysis

Top Keywords

variable selection
8
summary statistics
8
penalized regression
8
controlled variable
4
selection summary
4
statistics only?
4
only? solution
4
solution ghostknockoffs
4
ghostknockoffs penalized
4
regression identifying
4

Similar Publications

Use of Antihyperglycemic Medications Among US People with Limited English Proficiency.

J Gen Intern Med

January 2025

Department of Family Medicine, College of Human Medicine, Michigan State University, East Lansing, MI, 48824, USA.

Background: Language barriers can impact pharmaceutical disease management leading to potential health disparities among limited English proficiency (LEP) people with diabetes mellitus (DM) in the United States (US).

Objective: To assess the use of antihyperglycemic medications and estimate their impact on glycemic control by LEP status.

Design: Cross-sectional design.

View Article and Find Full Text PDF

Arthritis, a chronic inflammatory condition linked to cardiovascular disease (CVD) and bone fracture, is more frequent among military veterans and postmenopausal women. This study examined correlates of arthritis and relationships of arthritis with risks of developing CVD, bone fractures, and mortality among postmenopausal veteran and non-veteran women. We analyzed longitudinal data on 135,790 (3,436 veteran and 132,354 non-veteran) postmenopausal women from the Women's Health Initiative who were followed-up for an average of 16 years between enrollment (1993-1998) and February 17, 2024.

View Article and Find Full Text PDF

This study aimed to elucidate the potential causal relationship between 4,907 plasma proteins and the risk of gastric cancer using a two-sample Mendelian randomization approach. We utilized genome-wide association study (GWAS) data to perform two-sample Mendelian randomization analyses, treating the 4,907 plasma proteins as exposure factors and gastric cancer as the outcome. Instrumental variables for plasma proteins were selected based on strongly correlated SNPs identified through data processing and screening of the GWAS data provided by the deCode database.

View Article and Find Full Text PDF

Climate change is shifting optimal habitats for medicinal plants, potentially compromising the efficacy and therapeutic value of herbal remedies. Global warming and increased extreme weather events threaten the sustainability and pharmaceutical integrity of Angelica sinensis (Oliv.) Diels (A.

View Article and Find Full Text PDF

Identifying Autism Spectrum Disorder Based on Machine Learning for Multi-site fMRI.

J Neurosci Methods

January 2025

College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China; the Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen, China. Electronic address:

Background: Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by repetitive stereotypical behavior and social impairment. Early diagnosis is essential for developing a treatment plan for autism. Although multi-site data can expand the dataset to facilitate the process of data analysis, data heterogeneity between sites and the large amount of data make data analysis difficult.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!