Background: Genome-wide association studies (GWAS) have revealed many brain disorder-associated SNPs residing in the noncoding genome, rendering it a challenge to decipher the underlying pathogenic mechanisms.
Methods: Here, we present an unsupervised Bayesian framework to identify disease-associated genes by integrating risk SNPs with long-range chromatin interactions (iGOAT), including SNP-SNP interactions extracted from ∼500,000 patients and controls from the UK Biobank, and enhancer-promoter interactions derived from multiple brain cell types at different developmental stages.
Findings: The application of iGOAT to three psychiatric disorders and three neurodegenerative/neurological diseases predicted sets of high-risk (HRGs) and low-risk (LRGs) genes for each disorder.
Regular, systematic, and independent assessment of computational tools used to predict the pathogenicity of missense variants is necessary to evaluate their clinical and research utility and suggest directions for future improvement. Here, as part of the sixth edition of the Critical Assessment of Genome Interpretation (CAGI) challenge, we assess missense variant effect predictors (or variant impact predictors) on an evaluation dataset of rare missense variants from disease-relevant databases. Our assessment evaluates predictors submitted to the CAGI6 Annotate-All-Missense challenge, predictors commonly used by the clinical genetics community, and recently developed deep learning methods for variant effect prediction.
View Article and Find Full Text PDFUnlabelled: The pathogenesis of duodenal tumors in the inherited tumor syndromes familial adenomatous polyposis (FAP) and MUTYH-associated polyposis (MAP) is poorly understood. This study aimed to identify genes that are significantly mutated in these tumors and to explore the effects of these mutations. Whole exome and whole transcriptome sequencing identified recurrent somatic coding variants of phosphatidylinositol N-acetylglucosaminyltransferase subunit A (PIGA) in 19/70 (27%) FAP and MAP duodenal adenomas, and further confirmed the established driver roles for APC and KRAS.
View Article and Find Full Text PDFBackground: De novo mutations (DNMs) are variants that occur anew in the offspring of noncarrier parents. They are not inherited from either parent but rather result from endogenous mutational processes involving errors of DNA repair/replication. These spontaneous errors play a significant role in the causation of genetic disorders, and their importance in the context of molecular diagnostic medicine has become steadily more apparent as more DNMs have been reported in the literature.
View Article and Find Full Text PDFStudies have shown that drug targets with human genetic support are more likely to succeed in clinical trials. Hence, a tool integrating genetic evidence to prioritize drug target genes is beneficial for drug discovery. We built a genetic priority score (GPS) by integrating eight genetic features with drug indications from the Open Targets and SIDER databases.
View Article and Find Full Text PDFWhilst DNA repeat expansions cause numerous heritable human disorders, their origins and underlying pathological mechanisms are often unclear. We collated a dataset comprising 224 human repeat expansions encompassing 203 different genes, and performed a systematic analysis with respect to key topological features at the DNA, RNA and protein levels. Comparison with controls without known pathogenicity and genomic regions lacking repeats, allowed the construction of the first tool to discriminate repeat regions harboring pathogenic repeat expansions (DPREx).
View Article and Find Full Text PDFRecent evidence from proteomics and deep massively parallel sequencing studies have revealed that eukaryotic genomes contain substantial numbers of as-yet-uncharacterized open reading frames (ORFs). We define these uncharacterized ORFs as novel ORFs (nORFs). nORFs in humans are mostly under 100 codons and are found in diverse regions of the genome, including in long noncoding RNAs, pseudogenes, 3' UTRs, 5' UTRs, and alternative reading frames of canonical protein coding exons.
View Article and Find Full Text PDFHuman genome stability requires efficient repair of oxidized bases, which is initiated via damage recognition and excision by NEIL1 and other base excision repair (BER) pathway DNA glycosylases (DGs). However, the biological mechanisms underlying detection of damaged bases among the million-fold excess of undamaged bases remain enigmatic. Indeed, mutation rates vary greatly within individual genomes, and lesion recognition by purified DGs in the chromatin context is inefficient.
View Article and Find Full Text PDFIdentifying pathogenic variants and underlying functional alterations is challenging. To this end, we introduce MutPred2, a tool that improves the prioritization of pathogenic amino acid substitutions over existing methods, generates molecular mechanisms potentially causative of disease, and returns interpretable pathogenicity score distributions on individual genomes. Whilst its prioritization performance is state-of-the-art, a distinguishing feature of MutPred2 is the probabilistic modeling of variant impact on specific aspects of protein structure and function that can serve to guide experimental studies of phenotype-altering variants.
View Article and Find Full Text PDFGastroenterology
February 2021
Identifying the molecular programs underlying human organ development and how they differ from model species is key for understanding human health and disease. Developmental gene expression profiles provide a window into the genes underlying organ development and a direct means to compare them across species. We use a transcriptomic resource covering the development of seven organs to characterize the temporal profiles of human genes associated with distinct disease classes and to determine, for each human gene, the similarity of its spatiotemporal expression with its orthologs in rhesus macaque, mouse, rat, and rabbit.
View Article and Find Full Text PDFThe Human Gene Mutation Database (HGMD) constitutes a comprehensive collection of published germline mutations in nuclear genes that are thought to underlie, or are closely associated with human inherited disease. At the time of writing (June 2020), the database contains in excess of 289,000 different gene lesions identified in over 11,100 genes manually curated from 72,987 articles published in over 3100 peer-reviewed journals. There are primarily two main groups of users who utilise HGMD on a regular basis; research scientists and clinical diagnosticians.
View Article and Find Full Text PDFObjectives: Monogenic inflammatory bowel disease (IBD) comprises rare Mendelian causes of gut inflammation, often presenting in infants with severe and atypical disease. This study aimed to identify clinically relevant variants within 68 monogenic IBD genes in an unselected pediatric IBD cohort.
Methods: Whole exome sequencing was performed on patients with pediatric-onset disease.
Single nucleotide variants (SNVs) in intronic regions have yet to be systematically investigated for their disease-causing potential. Using known pathogenic and neutral intronic SNVs (iSNVs) as training data, we develop the RegSNPs-intron algorithm based on a random forest classifier that integrates RNA splicing, protein structure, and evolutionary conservation features. RegSNPs-intron showed excellent performance in evaluating the pathogenic impacts of iSNVs.
View Article and Find Full Text PDFEach human genome carries tens of thousands of coding variants. The extent to which this variation is functional and the mechanisms by which they exert their influence remains largely unexplored. To address this gap, we leverage the ExAC database of 60,706 human exomes to investigate experimentally the impact of 2009 missense single nucleotide variants (SNVs) across 2185 protein-protein interactions, generating interaction profiles for 4797 SNV-interaction pairs, of which 421 SNVs segregate at > 1% allele frequency in human populations.
View Article and Find Full Text PDFFamilial adenomatous polyposis (FAP) is characterised by the development of hundreds to thousands of colorectal adenomas and results from inherited or somatic mosaic variants in the APC gene. Index patients with suspected FAP are usually investigated by APC coding region sequence and dosage analysis in a clinical diagnostic setting. The identification of an APC variant which is predicted to alter protein function enables predictive genetic testing to guide the management of family members.
View Article and Find Full Text PDFDifferentiation between phenotypically neutral and disease-causing genetic variation remains an open and relevant problem. Among different types of variation, non-frameshifting insertions and deletions (indels) represent an understudied group with widespread phenotypic consequences. To address this challenge, we present a machine learning method, MutPred-Indel, that predicts pathogenicity and identifies types of functional residues impacted by non-frameshifting insertion/deletion variation.
View Article and Find Full Text PDFIt has long been known that canonical 5' splice site (5'SS) GT>GC variants may be compatible with normal splicing. However, to date, the actual scale of canonical 5'SSs capable of generating wild-type transcripts in the case of GT>GC substitutions remains unknown. Herein, combining data derived from a meta-analysis of 45 human disease-causing 5'SS GT>GC variants and a cell culture-based full-length gene splicing assay of 103 5'SS GT>GC substitutions, we estimate that ~15-18% of canonical GT 5'SSs retain their capacity to generate between 1% and 84% normal transcripts when GT is substituted by GC.
View Article and Find Full Text PDFRecent genetic studies and whole-genome sequencing projects have greatly improved our understanding of human variation and clinically actionable genetic information. Smaller ethnic populations, however, remain underrepresented in both individual and large-scale sequencing efforts and hence present an opportunity to discover new variants of biomedical and demographic significance. This report describes the sequencing and analysis of a genome obtained from an individual of Serbian origin, introducing tens of thousands of previously unknown variants to the currently available pool.
View Article and Find Full Text PDFBackground: Mucopolysaccharidosis-IVA (Morquio A disease) is a lysosomal disorder in which the abnormal accumulation of keratan sulfate and chondroitin-6-sulfate is consequent to mutations in the galactosamine-6-sulfatase (GALNS) gene. Since standard DNA sequencing analysis fails to detect about 16% of GALNS mutant alleles, gross DNA rearrangement screening and uniparental disomy evaluation are required to complete the molecular diagnosis. Despite this, the second pathogenic GALNS allele generally remains unidentified in ~ 5% of Morquio-A disease patients.
View Article and Find Full Text PDFBackground And Aims: Duodenal polyposis and cancer have become a key issue for patients with familial adenomatous polyposis (FAP) and MUTYH-associated polyposis (MAP). Almost all patients with FAP will develop duodenal adenomas, and 5% will develop cancer. The incidence of duodenal adenomas in MAP appears to be lower than in FAP, but the limited available data suggest a comparable increase in the relative risk and lifetime risk of duodenal cancer.
View Article and Find Full Text PDFMany genetic diseases exhibit considerable epidemiological comorbidity and common symptoms, which provokes debate about the extent of their etiological overlap. The rapid growth in the number of known disease-causing mutations in the Human Gene Mutation Database (HGMD) has allowed us to characterize genetic similarities between diseases by ascertaining the extent to which identical genetic mutations are shared between diseases. Using this approach, we show that 41.
View Article and Find Full Text PDFBackground: Small insertions and deletions (indels) have a significant influence in human disease and, in terms of frequency, they are second only to single nucleotide variants as pathogenic mutations. As the majority of mutations associated with complex traits are located outside the exome, it is crucial to investigate the potential pathogenic impact of indels in non-coding regions of the human genome.
Results: We present FATHMM-indel, an integrative approach to predict the functional effect, pathogenic or neutral, of indels in non-coding regions of the human genome.