Missing genotype data in a candidate gene association study can make it difficult to model the effects of multiple genetic variants simultaneously. In particular, when regression models are used to model phenotype as a function of SNP genotypes in several different genes, the most common approach is a complete case analysis, in which only individuals with no missing genotypes are included. But this can lead to substantial reduction in sample size and thus potential bias and loss in efficiency. A number of other methods for handling missing data are applicable, but have rarely been used in this context. The purpose of this paper is to describe how several standard methods for handling missing data can be applied or adapted to this problem, and to compare their performance using a simulation study. We demonstrate these techniques using an Alzheimer's disease association study. We show that the expectation-maximization algorithm and multiple imputation with a bootstrapped expectation-maximization sampling algorithm have the best properties of all the estimators studied.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7077088 | PMC |
http://dx.doi.org/10.1159/000273732 | DOI Listing |
Alzheimers Dement
December 2024
Boston University Alzheimer's Disease Research Center, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA.
Background: Alzheimer's disease (AD) has both genetic and environmental risk factors. Gene-environment interaction may help explain some missing heritability. There is strong evidence for cigarette smoking as a risk factor for AD.
View Article and Find Full Text PDFAlzheimers Dement
December 2024
Columbia University, New York, NY, USA.
Background: Alzheimer's disease (AD) missing heritability remains extensive despite numerous genetic risk loci identified by genome-wide association or sequencing studies. This has been attributed, at least partially, to mechanisms not currently investigated by traditional single-marker/gene approaches. Polygenic Risk Scores (PRS) aggregate sparse genetic information across the genome to identify individual genetic risk profiles for disease prediction and patient risk stratification.
View Article and Find Full Text PDFAlzheimers Dement
December 2024
Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium.
Background: Classical genome-wide association studies (GWAS) of Alzheimer's disease (AD), which successfully identified over 75 risk loci to date, are limited to the content of the imputation panels that typically do not cover all types of genetic variation, e.g., tandem repeats encompassing >55% of human genome.
View Article and Find Full Text PDFAlzheimers Dement
December 2024
Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA.
Background: To gain a deeper understanding of underlying molecular mechanisms in genomic regions associated with Alzheimer's disease (AD), the National Institute on Aging (NIA) launched the Alzheimer's Disease Sequencing Project (ADSP) Functional Genomics Consortium (FunGen-AD) in 2021.
Method: The first effort of this collaboration, coordinated by the NIA Genetics of Alzheimer's Disease Data Storage Site (NIAGADS), aggregated functional genomics (FG) data from 5 cohorts, including ∼3,000 samples of European (EA) and African ancestries (AA). We used this data to map Quantitative Trait Loci (xQTL) on AD-specific human tissues and cells, providing insights into how non-coding genetic variants contribute to AD risk.
Mol Ecol Resour
January 2025
Section for Molecular Ecology and Evolution, Globe Institute, University of Copenhagen, Copenhagen, Denmark.
Reduced representation sequencing (RRS) has proven to be a cost-effective solution for sequencing subsets of the genome in non-model species for large-scale studies. However, the targeted nature of RRS approaches commonly introduces large amounts of missing data, leading to reduced statistical power and biased estimates in downstream analyses. Genotype imputation, the statistical inference of missing sites across the genome, is a powerful alternative to overcome the caveats associated with missing sites.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!