Probability of detecting disease-associated single nucleotide polymorphisms in case-control genome-wide association studies.

Biostatistics

Division of Cancer Epidemiology and Genetics, National Cancer Institute, 6120 Executive Boulevard, EPS 8032, Bethesda, MD 20892-7244, USA.

Published: April 2008

Some case-control genome-wide association studies (CCGWASs) select promising single nucleotide polymorphisms (SNPs) by ranking corresponding p-values, rather than by applying the same p-value threshold to each SNP. For such a study, we define the detection probability (DP) for a specific disease-associated SNP as the probability that the SNP will be "T-selected," namely have one of the top T largest chi-square values (or smallest p-values) for trend tests of association. The corresponding proportion positive (PP) is the fraction of selected SNPs that are true disease-associated SNPs. We study DP and PP analytically and via simulations, both for fixed and for random effects models of genetic risk, that allow for heterogeneity in genetic risk. DP increases with genetic effect size and case-control sample size and decreases with the number of nondisease-associated SNPs, mainly through the ratio of T to N, the total number of SNPs. We show that DP increases very slowly with T, and the increment in DP per unit increase in T declines rapidly with T. DP is also diminished if the number of true disease SNPs exceeds T. For a genetic odds ratio per minor disease allele of 1.2 or less, even a CCGWAS with 1000 cases and 1000 controls requires T to be impractically large to achieve an acceptable DP, leading to PP values so low as to make the study futile and misleading. We further calculate the sample size of the initial CCGWAS that is required to minimize the total cost of a research program that also includes follow-up studies to examine the T-selected SNPs. A large initial CCGWAS is desirable if genetic effects are small or if the cost of a follow-up study is large.

Download full-text PDF

Source
http://dx.doi.org/10.1093/biostatistics/kxm032DOI Listing

Publication Analysis

Top Keywords

single nucleotide
8
nucleotide polymorphisms
8
case-control genome-wide
8
genome-wide association
8
association studies
8
genetic risk
8
sample size
8
initial ccgwas
8
snps
7
genetic
5

Similar Publications

The COVID-19 pandemic has underscored the importance of virus surveillance in public health and wastewater-based epidemiology (WBE) has emerged as a non-invasive, cost-effective method for monitoring SARS-CoV-2 and its variants at the community level. Unfortunately, current variant surveillance methods depend heavily on updated genomic databases with data derived from clinical samples, which can become less sensitive and representative as clinical testing and sequencing efforts decline.In this paper, we introduce HERCULES (High-throughput Epidemiological Reconstruction and Clustering for Uncovering Lineages from Environmental SARS-CoV-2), an unsupervised method that uses long-read sequencing of a single 1 Kb fragment of the Spike gene.

View Article and Find Full Text PDF

scATAC-seq generates more accurate and complete regulatory maps than bulk ATAC-seq.

Sci Rep

January 2025

MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, OX3 9DS, UK.

Bulk ATAC-seq assays have been used to map and profile the chromatin accessibility of regulatory elements such as enhancers, promoters, and insulators. This has provided great insight into the regulation of gene expression in many cell types in a variety of organisms. To date, ATAC-seq has most often been used to provide an average evaluation of chromatin accessibility in populations of cells.

View Article and Find Full Text PDF

Resolving the molecular basis of a Mendelian condition remains challenging owing to the diverse mechanisms by which genetic variants cause disease. To address this, we developed a synchronized long-read genome, methylome, epigenome and transcriptome sequencing approach, which enables accurate single-nucleotide, insertion-deletion and structural variant calling and diploid de novo genome assembly. This permits the simultaneous elucidation of haplotype-resolved CpG methylation, chromatin accessibility and full-length transcript information in a single long-read sequencing run.

View Article and Find Full Text PDF

Marek's disease (MD), a T cell lymphoma disease in chickens, is caused by the Marek's disease virus (MDV) found ubiquitously in the poultry industry. Genetically resistant Line 6 (L6) and susceptible Line 7 (L7) chickens have been instrumental to research on avian immune system response to MDV infection. In this study we characterized molecular signatures unique to splenic immune cell types across different genetic backgrounds 6 days after infection.

View Article and Find Full Text PDF

A subgroup of patients with acute depression show an impaired regulation of the hypothalamic-pituitary-adrenocortical axis, which can be sensitively diagnosed with the combined dexamethasone (dex)/corticotropin releasing hormone (CRH)-test. This neuropathological alteration is assumed to be a result of hyperactive AVP/V1b signalling. Given the complicated procedure of the dex/CRH-test, this study aimed to develop a genetic variants-based alternative approach to predict the outcome of the dex/CRH-test in acute depression.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!