The emergence of biobank-level datasets offers new opportunities to discover novel biomarkers and develop predictive algorithms for human disease. Here, we present an ensemble machine-learning framework (machine learning with phenotype associations, MILTON) utilizing a range of biomarkers to predict 3,213 diseases in the UK Biobank. Leveraging the UK Biobank's longitudinal health record data, MILTON predicts incident disease cases undiagnosed at time of recruitment, largely outperforming available polygenic risk scores.
View Article and Find Full Text PDFIntegrating human genomics and proteomics can help elucidate disease mechanisms, identify clinical biomarkers and discover drug targets. Because previous proteogenomic studies have focused on common variation via genome-wide association studies, the contribution of rare variants to the plasma proteome remains largely unknown. Here we identify associations between rare protein-coding variants and 2,923 plasma protein abundances measured in 49,736 UK Biobank individuals.
View Article and Find Full Text PDFRare genetic diseases affect millions, and identifying causal DNA variants is essential for patient care. Therefore, it is imperative to estimate the effect of each independent variant and improve their pathogenicity classification. Our study of 140 214 unrelated UK Biobank (UKB) participants found that each of them carries a median of 7 variants previously reported as pathogenic or likely pathogenic.
View Article and Find Full Text PDFSynonymous mutations change the DNA sequence of a gene without affecting the amino acid sequence of the encoded protein. Although some synonymous mutations can affect RNA splicing, translational efficiency, and mRNA stability, studies in human genetics, mutagenesis screens, and other experiments and evolutionary analyses have repeatedly shown that most synonymous variants are neutral or only weakly deleterious, with some notable exceptions. Based on a recent study in yeast, there have been claims that synonymous mutations could be as important as nonsynonymous mutations in causing disease, assuming the yeast findings hold up and translate to humans.
View Article and Find Full Text PDFGenetic variants showing associations with specific biological traits and diseases detected by genome-wide association studies (GWAS) commonly map to non-coding DNA regulatory regions. Many of these regions are located considerable distances away from the genes they regulate and come into their proximity through 3D chromosomal interactions. We previously developed COGS, a statistical pipeline for linking GWAS variants with their putative target genes based on 3D chromosomal interaction data arising from high-resolution assays such as Promoter Capture Hi-C (PCHi-C).
View Article and Find Full Text PDFBackground: Genome-wide association studies (GWAS) have identified pervasive sharing of genetic architectures across multiple immune-mediated diseases (IMD). By learning the genetic basis of IMD risk from common diseases, this sharing can be exploited to enable analysis of less frequent IMD where, due to limited sample size, traditional GWAS techniques are challenging.
Methods: Exploiting ideas from Bayesian genetic fine-mapping, we developed a disease-focused shrinkage approach to allow us to distill genetic risk components from GWAS summary statistics for a set of related diseases.
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
View Article and Find Full Text PDFAn amendment to this paper has been published and can be accessed via a link at the top of the paper.
View Article and Find Full Text PDFDeriving mechanisms of immune-mediated disease from GWAS data remains a formidable challenge, with attempts to identify causal variants being frequently hampered by strong linkage disequilibrium. To determine whether causal variants could be identified from their functional effects, we adapted a massively parallel reporter assay for use in primary CD4 T cells, the cell type whose regulatory DNA is most enriched for immune-mediated disease SNPs. This enabled the effects of candidate SNPs to be examined in a relevant cellular context and generated testable hypotheses into disease mechanisms.
View Article and Find Full Text PDFIn the version of this article initially published, the bibliographic information for reference 2 was incorrect in the reference list, and reference 2 was cited incorrectly at the end of the second sentence in the second paragraph ("...
View Article and Find Full Text PDFBackground: Hi-C and capture Hi-C (CHi-C) are used to map physical contacts between chromatin regions in cell nuclei using high-throughput sequencing. Analysis typically proceeds considering the evidence for contacts between each possible pair of fragments independent from other pairs. This can produce long runs of fragments which appear to all make contact with the same baited fragment of interest.
View Article and Find Full Text PDFGenome-wide association studies are transformative in revealing the polygenetic basis of common diseases, with autoimmune diseases leading the charge. Although the field is just over 10 years old, advances in understanding the underlying mechanistic pathways of these conditions, which result from a dense multifactorial blend of genetic, developmental and environmental factors, have already been informative, including insights into therapeutic possibilities. Nevertheless, the challenge of identifying the actual causal genes and pathways and their biological effects on altering disease risk remains for many identified susceptibility regions.
View Article and Find Full Text PDFLong-range interactions between regulatory elements and gene promoters play key roles in transcriptional regulation. The vast majority of interactions are uncharted, constituting a major missing link in understanding genome control. Here, we use promoter capture Hi-C to identify interacting regions of 31,253 promoters in 17 human primary hematopoietic cell types.
View Article and Find Full Text PDFIdentification of candidate causal variants in regions associated with risk of common diseases is complicated by linkage disequilibrium (LD) and multiple association signals. Nonetheless, accurate maps of these variants are needed, both to fully exploit detailed cell specific chromatin annotation data to highlight disease causal mechanisms and cells, and for design of the functional studies that will ultimately be required to confirm causal mechanisms. We adapted a Bayesian evolutionary stochastic search algorithm to the fine mapping problem, and demonstrated its improved performance over conventional stepwise and regularised regression through simulation studies.
View Article and Find Full Text PDFSeasonal variations are rarely considered a contributing component to human tissue function or health, although many diseases and physiological process display annual periodicities. Here we find more than 4,000 protein-coding mRNAs in white blood cells and adipose tissue to have seasonal expression profiles, with inverted patterns observed between Europe and Oceania. We also find the cellular composition of blood to vary by season, and these changes, which differ between the United Kingdom and The Gambia, could explain the gene expression periodicity.
View Article and Find Full Text PDFThe genes and cells that mediate genetic associations identified through genome-wide association studies (GWAS) are only partially understood. Several studies that have investigated the genetic regulation of gene expression have shown that disease-associated variants are over-represented amongst expression quantitative trait loci (eQTL) variants. Evidence for colocalisation of eQTL and disease causal variants can suggest causal genes and cells for these genetic associations.
View Article and Find Full Text PDFCopy number variants (CNVs) have been proposed as a possible source of 'missing heritability' in complex human diseases. Two studies of type 1 diabetes (T1D) found null associations with common copy number polymorphisms, but CNVs of low frequency and high penetrance could still play a role. We used the Log-R-ratio intensity data from a dense single nucleotide polymorphism (SNP) array, ImmunoChip, to detect rare CNV deletions (rDELs) and duplications (rDUPs) in 6808 T1D cases, 9954 controls and 2206 families with T1D-affected offspring.
View Article and Find Full Text PDFPathway analysis can complement point-wise single nucleotide polymorphism (SNP) analysis in exploring genomewide association study (GWAS) data to identify specific disease-associated genes that can be candidate causal genes. We propose a straightforward methodology that can be used for conducting a gene-based pathway analysis using summary GWAS statistics in combination with widely available reference genotype data. We used this method to perform a gene-based pathway analysis of a type 1 diabetes (T1D) meta-analysis GWAS (of 7,514 cases and 9,045 controls).
View Article and Find Full Text PDFMotivation: Genome-wide association studies (GWAS) have identified many loci implicated in disease susceptibility. Integration of GWAS summary statistics (P-values) and functional genomic datasets should help to elucidate mechanisms.
Results: We extended a non-parametric SNP set enrichment method to test for enrichment of GWAS signals in functionally defined loci to a situation where only GWAS P-values are available.