Real world scenarios in rare variant association analysis: the impact of imbalance and sample size on the power in silico.

BMC Bioinformatics

Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

Published: January 2019

Background: The development of sequencing techniques and statistical methods provides great opportunities for identifying the impact of rare genetic variation on complex traits. However, there is a lack of knowledge on the impact of sample size, case numbers, the balance of cases vs controls for both burden and dispersion based rare variant association methods. For example, Phenome-Wide Association Studies may have a wide range of case and control sample sizes across hundreds of diagnoses and traits, and with the application of statistical methods to rare variants, it is important to understand the strengths and limitations of the analyses.

Results: We conducted a large-scale simulation of randomly selected low-frequency protein-coding regions using twelve different balanced samples with an equal number of cases and controls as well as twenty-one unbalanced sample scenarios. We further explored statistical performance of different minor allele frequency thresholds and a range of genetic effect sizes. Our simulation results demonstrate that using an unbalanced study design has an overall higher type I error rate for both burden and dispersion tests compared with a balanced study design. Regression has an overall higher type I error with balanced cases and controls, while SKAT has higher type I error for unbalanced case-control scenarios. We also found that both type I error and power were driven by the number of cases in addition to the case to control ratio under large control group scenarios. Based on our power simulations, we observed that a SKAT analysis with case numbers larger than 200 for unbalanced case-control models yielded over 90% power with relatively well controlled type I error. To achieve similar power in regression, over 500 cases are needed. Moreover, SKAT showed higher power to detect associations in unbalanced case-control scenarios than regression.

Conclusions: Our results provide important insights into rare variant association study designs by providing a landscape of type I error and statistical power for a wide range of sample sizes. These results can serve as a benchmark for making decisions about study design for rare variant analyses.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6343276PMC
http://dx.doi.org/10.1186/s12859-018-2591-6DOI Listing

Publication Analysis

Top Keywords

type error
24
rare variant
16
variant association
12
cases controls
12
study design
12
higher type
12
unbalanced case-control
12
sample size
8
statistical methods
8
case numbers
8

Similar Publications

We present the R package MIIVefa, designed to implement the MIIV-EFA algorithm. This algorithm explores and identifies the underlying factor structure within a set of variables. The resulting model is not a typical exploratory factor analysis (EFA) model because some loadings are fixed to zero and it allows users to include hypothesized correlated errors such as might occur with longitudinal data.

View Article and Find Full Text PDF

The effective reproduction number serves as a metric of population-wide, time-varying disease spread. During the early years of the COVID-19 pandemic, this metric was primarily derived from case data, which has varied in quality and representativeness due to changes in testing volume, test-seeking behavior, and resource constraints. Deriving nowcasting estimates from alternative data sources such as wastewater provides complementary information that could inform future public health responses.

View Article and Find Full Text PDF

Ex vivo imaging-based high content phenotyping of patients with rheumatoid arthritis.

EBioMedicine

December 2024

CeMM Research Centre for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria; Centre for Physiology and Pharmacology, Medical University of Vienna; Vienna, Austria. Electronic address:

Background: High content imaging-based functional precision medicine approaches have been developed and successfully applied in the field of haemato-oncology. For rheumatoid arthritis (RA), treatment selection is still based on a trial-and-error principle, and biomarkers for patient stratification and drug response prediction are needed.

Methods: A high content, high throughput microscopy-based phenotyping pipeline for peripheral blood mononuclear cells (PBMCs) was developed, allowing for the quantification of cell type frequencies, cell type specific morphology and intercellular interactions from patients with RA (n = 65) and healthy controls (HC, n = 33).

View Article and Find Full Text PDF

Interpreting statistical significance in hominin dimorphism: Power and Type I error rates for resampling tests of univariate and missing-data multivariate size dimorphism estimation methods in the fossil record.

J Hum Evol

December 2024

Department of Anthropology, University at Albany (SUNY), 1400 Washington Avenue, Albany, NY 12222, USA; College of Fellows, Institute of Advanced Study, Durham University, Cosin's Hall, Palace Green, Durham, DH1 3RL, UK; Department of Anthropology, Durham University, Dawson Building, South Road, Durham, DH1 3LE, UK. Electronic address:

The degree of sexual size dimorphism in fossil hominins is important evidence for the evaluation of evolutionary hypotheses, but it is also difficult/impossible to measure directly. Multiple methods have been developed to estimate dimorphism in univariate and multivariate datasets, including when data are missing. This paper introduces 'dimorph', an R package that implements many of these methods and associated resampling-based significance tests and evaluates their performance in terms of Type I error rates and power.

View Article and Find Full Text PDF

Background: Dialysis Access (DA) stenosis impacts hemodialysis efficiency and patient health, necessitating exams for early lesion detection. Ultrasound is widely used due to its non-invasive, cost-effective nature. Assessing all patients in large hemodialysis facilities strains resources and relies on operator expertise.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!