Bayesian LASSO for population stratification correction in rare haplotype association studies.

Stat Appl Genet Mol Biol

Department of Statistics, The Ohio State University, Columbus, OH 43210, USA.

Published: January 2024

AI Article Synopsis

  • Population stratification (PS) can confound results in SNP and haplotype studies, leading researchers to use principal component regression (PCR) and linear mixed models (LMM) despite their limitations.
  • This paper introduces a new method called QBLstrat, based on the Bayesian LASSO framework, which effectively addresses PS when identifying haplotypes linked to continuous traits.
  • QBLstrat outperforms existing methods by controlling false positives and maintaining good power in detecting true associations, as demonstrated through simulations and real-world data analysis.

Article Abstract

Population stratification (PS) is one major source of confounding in both single nucleotide polymorphism (SNP) and haplotype association studies. To address PS, principal component regression (PCR) and linear mixed model (LMM) are the current standards for SNP associations, which are also commonly borrowed for haplotype studies. However, the underfitting and overfitting problems introduced by PCR and LMM, respectively, have yet to be addressed. Furthermore, there have been only a few theoretical approaches proposed to address PS specifically for haplotypes. In this paper, we propose a new method under the Bayesian LASSO framework, QBLstrat, to account for PS in identifying rare and common haplotypes associated with a continuous trait of interest. QBLstrat utilizes a large number of principal components (PCs) with appropriate priors to sufficiently correct for PS, while shrinking the estimates of unassociated haplotypes and PCs. We compare the performance of QBLstrat with the Bayesian counterparts of PCR and LMM and a current method, haplo.stats. Extensive simulation studies and real data analyses show that QBLstrat is superior in controlling false positives while maintaining competitive power for identifying true positives under PS.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10794901PMC
http://dx.doi.org/10.1515/sagmb-2022-0034DOI Listing

Publication Analysis

Top Keywords

bayesian lasso
8
population stratification
8
haplotype association
8
association studies
8
lmm current
8
pcr lmm
8
lasso population
4
stratification correction
4
correction rare
4
rare haplotype
4

Similar Publications

Objective: To develop and validate a new prediction model based on the Lass-logistic regression with inflammatory serologic markers for the assessment of carotid plaque stability, providing clinicians with a reliable tool for risk stratification and decision-making in the management of carotid artery disease.

Methods: In this study, we retrospectively collected the data of the patients who underwent carotid endarterectomy (CEA) from 2019 to 2023 in Nanjing Drum Tower Hospital. Demographic characteristics, vascular risk factors, and the results of preoperative serum biochemistry were measured and collected.

View Article and Find Full Text PDF

Purpose: To evaluate the efficacy of prominent machine learning algorithms in predicting normal tissue complication probability using clinical data obtained from 2 distinct disease sites and to create a software tool that facilitates the automatic determination of the optimal algorithm to model any given labeled data set.

Methods And Materials: We obtained 3 sets of radiation toxicity data (478 patients) from our clinic: gastrointestinal toxicity, radiation pneumonitis, and radiation esophagitis. These data comprised clinicopathological and dosimetric information for patients diagnosed with non-small cell lung cancer and anal squamous cell carcinoma.

View Article and Find Full Text PDF

Chronic kidney disease (CKD) involves numerous variables, but only a few significantly impact the classification task. The statistically equivalent signature (SES) method, inspired by constraint-based learning of Bayesian networks, is employed to identify essential features in CKD. Unlike conventional feature selection methods, which typically focus on a single set of features with the highest predictive potential, the SES method can identify multiple predictive feature subsets with similar performance.

View Article and Find Full Text PDF

Using dense genomic markers opens up new opportunities and challenges for breeding programs. The need to penalize marker-specific regression coefficients becomes particularly important when dense markers are available. Therefore, fitting the marker effects to observations using a regularization technique, such as Bayesian LASSO (BL) regression, is of great interesting.

View Article and Find Full Text PDF

The causes of visual impairment are complex and may be influenced by exposure to environmental pollutants. Using data from the 2003-2004 National Health and Nutrition Examination Survey (NHANES), we examined the association between exposure to ten polycyclic aromatic hydrocarbons (PAHs) and vision problems in 1149 U.S.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!