Gene-based burden tests are a popular and powerful approach for analysis of exome-wide association studies. These approaches combine sets of variants within a gene into a single burden score that is then tested for association. Typically, a range of burden scores are calculated and tested across a range of annotation classes and frequency bins.
View Article and Find Full Text PDFWhole-genome sequencing (WGS), whole-exome sequencing (WES) and array genotyping with imputation (IMP) are common strategies for assessing genetic variation and its association with medically relevant phenotypes. To date, there has been no systematic empirical assessment of the yield of these approaches when applied to hundreds of thousands of samples to enable the discovery of complex trait genetic signals. Using data for 100 complex traits from 149,195 individuals in the UK Biobank, we systematically compare the relative yield of these strategies in genetic association studies.
View Article and Find Full Text PDFUnlabelled: Few studies have demonstrated reproducible gene-diet interactions (GDIs) impacting metabolic disease risk factors, likely due in part to measurement error in dietary intake estimation and insufficient capture of rare genetic variation. We aimed to identify GDIs across the genetic frequency spectrum impacting the macronutrient-glycemia relationship in genetically and culturally diverse cohorts. We analyzed 33,187 participants free of diabetes from 10 National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine program cohorts with whole-genome sequencing, self-reported diet, and glycemic trait data.
View Article and Find Full Text PDFExpression quantitative trait locus (eQTL) analysis associates SNPs with gene expression; these relationships can be represented as a bipartite network with association strength as "edge weights" between SNPs and genes. However, most eQTL networks use binary edge weights based on thresholded FDR estimates: definitions that influence reproducibility and downstream analyses. We constructed twenty-nine tissue-specific eQTL networks using GTEx data and evaluated a comprehensive set of network specifications based on false discovery rates, test statistics, and p values, focusing on the degree centrality-a metric of an SNP or gene node's potential network influence.
View Article and Find Full Text PDFSummary: We developed the variant-Set Test for Association using Annotation infoRmation (STAAR) workflow description language (WDL) workflow to facilitate the analysis of rare variants in whole genome sequencing association studies. The open-access STAAR workflow written in the WDL allows a user to perform rare variant testing for both gene-centric and genetic region approaches, enabling genome-wide, candidate and conditional analyses. It incorporates functional annotations into the workflow as introduced in the STAAR method in order to boost the rare variant analysis power.
View Article and Find Full Text PDFSummary: Amidst the continuing spread of coronavirus disease-19 (COVID-19), real-time data analysis and visualization remain critical the general public to track the pandemic's impact and to inform policy making by officials. Multiple metrics permit the evaluation of the spread, infection and mortality of infectious diseases. For example, numbers of new cases and deaths provide easily interpretable measures of absolute impact within a given population and time frame, while the effective reproduction rate provides an epidemiological measure of the rate of spread.
View Article and Find Full Text PDFSample sizes vary substantially across tissues in the Genotype-Tissue Expression (GTEx) project, where considerably fewer samples are available from certain inaccessible tissues, such as the substantia nigra (SSN), than from accessible tissues, such as blood. This severely limits power for identifying tissue-specific expression quantitative trait loci (eQTL) in undersampled tissues. Here we propose Surrogate Phenotype Regression Analysis (Spray) for leveraging information from a correlated surrogate outcome (eg, expression in blood) to improve inference on a partially missing target outcome (eg, expression in SSN).
View Article and Find Full Text PDFAims: To determine the relationship between hormonal contraceptive (HC) use and painful symptoms, particularly those associated with headache and painful temporomandibular disorders (TMD).
Methods: Data from the Orofacial Pain: Prospective Evaluation and Risk Assessment (OPPERA) prospective cohort study were used. During the 2.
Background: Identifying county-level characteristics associated with high coronavirus 2019 (COVID-19) burden can help allow for data-driven, equitable allocation of public health intervention resources and reduce burdens on health care systems.
Methods: Synthesizing data from various government and nonprofit institutions for all 3142 United States (US) counties, we studied county-level characteristics that were associated with cumulative and weekly case and death rates through 12/21/2020. We used generalized linear mixed models to model cumulative and weekly (40 repeated measures per county) cases and deaths.
Unlabelled: Identifying areas with high COVID-19 burden and their characteristics can help improve vaccine distribution and uptake, reduce burdens on health care systems, and allow for better allocation of public health intervention resources. Synthesizing data from various government and nonprofit institutions of 3,142 United States (US) counties as of 12/21/2020, we studied county-level characteristics that are associated with cumulative case and death rates using regression analyses. Our results showed counties that are more rural, counties with more White/non-White segregation, and counties with higher percentages of people of color, in poverty, with no high school diploma, and with medical comorbidities such as diabetes and hypertension are associated with higher cumulative COVID-19 case and death rates.
View Article and Find Full Text PDFTraditional classification and prognostic approaches for chronic pain conditions focus primarily on anatomically based clinical characteristics not based on underlying biopsychosocial factors contributing to perception of clinical pain and future pain trajectories. Using a supervised clustering approach in a cohort of temporomandibular disorder cases and controls from the Orofacial Pain: Prospective Evaluation and Risk Assessment study, we recently developed and validated a rapid algorithm (ROPA) to pragmatically classify chronic pain patients into 3 groups that differed in clinical pain report, biopsychosocial profiles, functional limitations, and comorbid conditions. The present aim was to examine the generalizability of this clustering procedure in 2 additional cohorts: a cohort of patients with chronic overlapping pain conditions (Complex Persistent Pain Conditions study) and a real-world clinical population of patients seeking treatment at duke innovative pain therapies.
View Article and Find Full Text PDFClinical trial results have recently demonstrated that inhibiting inflammation by targeting the interleukin-1β pathway can offer a significant reduction in lung cancer incidence and mortality, highlighting a pressing and unmet need to understand the benefits of inflammation-focused lung cancer therapies at the genetic level. While numerous genome-wide association studies (GWAS) have explored the genetic etiology of lung cancer, there remains a large gap between the type of information that may be gleaned from an association study and the depth of understanding necessary to explain and drive translational findings. Thus, in this study we jointly model and integrate extensive multiomics data sources, utilizing a total of 40 genome-wide functional annotations that augment previously published results from the International Lung Cancer Consortium (ILCCO) GWAS, to prioritize and characterize single nucleotide polymorphisms (SNPs) that increase risk of squamous cell lung cancer through the inflammatory and immune responses.
View Article and Find Full Text PDFLarge-scale whole-genome sequencing studies have enabled the analysis of rare variants (RVs) associated with complex phenotypes. Commonly used RV association tests have limited scope to leverage variant functions. We propose STAAR (variant-set test for association using annotation information), a scalable and powerful RV association test method that effectively incorporates both variant categories and multiple complementary annotations using a dynamic weighting scheme.
View Article and Find Full Text PDFWhole-genome sequencing (WGS) can improve assessment of low-frequency and rare variants, particularly in non-European populations that have been underrepresented in existing genomic studies. The genetic determinants of C-reactive protein (CRP), a biomarker of chronic inflammation, have been extensively studied, with existing genome-wide association studies (GWASs) conducted in >200,000 individuals of European ancestry. In order to discover novel loci associated with CRP levels, we examined a multi-ancestry population (n = 23,279) with WGS (∼38× coverage) from the Trans-Omics for Precision Medicine (TOPMed) program.
View Article and Find Full Text PDFMotivation: Cancer genomics studies frequently aim to identify genes that are differentially expressed between clinically distinct patient subgroups, generally by testing single genes one at a time. However, the results of any individual transcriptomic study are often not fully reproducible. A particular challenge impeding statistical analysis is the difficulty of distinguishing between differential expression comprising part of the genomic disease etiology and that induced by downstream effects.
View Article and Find Full Text PDFMediation analysis provides an attractive causal inference framework to decompose the total effect of an exposure on an outcome into natural direct effects and natural indirect effects acting through a mediator. For binary outcomes, mediation analysis methods have been developed using logistic regression when the binary outcome is rare. These methods will not hold in practice when a disease is common.
View Article and Find Full Text PDFComput Stat Data Anal
December 2017
Cluster analysis methods are used to identify homogeneous subgroups in a data set. In biomedical applications, one frequently applies cluster analysis in order to identify biologically interesting subgroups. In particular, one may wish to identify subgroups that are associated with a particular outcome of interest.
View Article and Find Full Text PDFObjective: The majority of smokers are not motivated to quit within 30 days. We examined whether these smokers are a homogeneous group, hypothesizing that subtypes of unmotivated smokers could be identified.
Method: Included were 500 smokers not ready to quit within 30 days who completed an online survey assessing variables known to be associated with quitting.