Background: Splicing of genomic exons into mRNAs is a critical prerequisite for the accurate synthesis of human proteins. Genetic variants impacting splicing underlie a substantial proportion of genetic disease, but are challenging to identify beyond those occurring at donor and acceptor dinucleotides. To address this, various methods aim to predict variant effects on splicing. Recently, deep neural networks (DNNs) have been shown to achieve better results in predicting splice variants than other strategies.

Methods: It has been unclear how best to integrate such process-specific scores into genome-wide variant effect predictors. Here, we use a recently published experimental data set to compare several machine learning methods that score variant effects on splicing. We integrate the best of those approaches into general variant effect prediction models and observe the effect on classification of known pathogenic variants.

Results: We integrate two specialized splicing scores into CADD (Combined Annotation Dependent Depletion; cadd.gs.washington.edu ), a widely used tool for genome-wide variant effect prediction that we previously developed to weight and integrate diverse collections of genomic annotations. With this new model, CADD-Splice, we show that inclusion of splicing DNN effect scores substantially improves predictions across multiple variant categories, without compromising overall performance.

Conclusions: While splice effect scores show superior performance on splice variants, specialized predictors cannot compete with other variant scores in general variant interpretation, as the latter account for nonsense and missense effects that do not alter splicing. Although only shown here for splice scores, we believe that the applied approach will generalize to other specific molecular processes, providing a path for the further improvement of genome-wide variant effect prediction.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7901104PMC
http://dx.doi.org/10.1186/s13073-021-00835-9DOI Listing

Publication Analysis

Top Keywords

genome-wide variant
16
variant prediction
16
splice scores
12
variant
10
variant effects
8
effects splicing
8
splice variants
8
general variant
8
scores
7
splicing
7

Similar Publications

Background: Evidence indicates a negative link between glucosamine and age-related cognitive decline and sarcopenia. However, the causal relationship remains uncertain. This study aims to verify whether glucosamine is causally associated with cognitive function and sarcopenia.

View Article and Find Full Text PDF

Antidepressants exhibit a considerable variation in efficacy, and increasing evidence suggests that individual genetics contribute to antidepressant treatment response. Here, we combined data on antidepressant non-response measured using rating scales for depressive symptoms, questionnaires of treatment effect, and data from electronic health records, to increase statistical power to detect genomic loci associated with non-response to antidepressants in a total sample of 135,471 individuals prescribed antidepressants (25,255 non-responders and 110,216 responders). We performed genome-wide association meta-analyses, genetic correlation analyses, leave-one-out polygenic prediction, and bioinformatics analyses for genetically informed drug prioritization.

View Article and Find Full Text PDF

Case-control genome-wide association studies (GWAS) are often used to find associations between genetic variants and diseases. When case-control GWAS are conducted, researchers must make decisions regarding how many cases and how many controls to include in the study. Depending on differing availability and cost of controls and cases, varying case fractions are used in case-control GWAS.

View Article and Find Full Text PDF

Unlabelled: The maturation of RNA is mediated by the coordinated actions of RNA-binding proteins through post-transcriptional pre-mRNA processing. This process is a central regulatory mechanism for gene expression and plays a crucial role in the development of complex biological systems. MYC directly upregulates transcription of genes encoding the core components of pre-mRNA splicing machinery.

View Article and Find Full Text PDF

Carotenoids are dietary bioactive compounds with health effects that are biomarkers of fruit and vegetable intake. Here, we examine genetic associations with plasma and skin carotenoid concentrations in two rigorously phenotyped human cohorts (n=317). Analysis of genome-wide SNPs revealed heritability to vary by genetic ancestry (h²=0.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!