Quantifying the pathogenicity of protein variants in human disease-related genes would have a marked effect on clinical decisions, yet the overwhelming majority (over 98%) of these variants still have unknown consequences. In principle, computational methods could support the large-scale interpretation of genetic variants. However, state-of-the-art methods have relied on training machine learning models on known disease labels. As these labels are sparse, biased and of variable quality, the resulting models have been considered insufficiently reliable. Here we propose an approach that leverages deep generative models to predict variant pathogenicity without relying on labels. By modelling the distribution of sequence variation across organisms, we implicitly capture constraints on the protein sequences that maintain fitness. Our model EVE (evolutionary model of variant effect) not only outperforms computational approaches that rely on labelled data but also performs on par with, if not better than, predictions from high-throughput experiments, which are increasingly used as evidence for variant classification. We predict the pathogenicity of more than 36 million variants across 3,219 disease genes and provide evidence for the classification of more than 256,000 variants of unknown significance. Our work suggests that models of evolutionary information can provide valuable independent evidence for variant interpretation that will be widely useful in research and clinical settings.

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41586-021-04043-8DOI Listing

Publication Analysis

Top Keywords

deep generative
8
generative models
8
models evolutionary
8
evidence variant
8
models
5
variants
5
disease variant
4
variant prediction
4
prediction deep
4
evolutionary data
4

Similar Publications

Brain iron deposition and cognitive decline in patients with cerebral small vessel disease : a quantitative susceptibility mapping study.

Alzheimers Res Ther

January 2025

Department of Radiology, Weill Medical College of Cornell University, New York, NY, USA, Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, USA.

Background: Quantitative susceptibility mapping (QSM) can study the susceptibility values of brain tissue which allows for noninvasive examination of local brain iron levels in both normal and pathological conditions.

Purpose: Our study compares brain iron deposition in gray matter (GM) nuclei between cerebral small vessel disease (CSVD) patients and healthy controls (HCs), exploring factors that affect iron deposition and cognitive function.

Materials And Methods: A total of 321 subjects were enrolled in this study.

View Article and Find Full Text PDF

Purpose: The Coronavirus Disease 2019 (COVID-19) pandemic delayed elective procedures such as total joint arthroplasty. As surgical volumes return to prepandemic levels, understanding the implications of COVID-19 becomes imperative. This study explored the effects of COVID-19 on the short-term outcomes of hip arthroplasty.

View Article and Find Full Text PDF

G-SET-DCL: a guided sequential episodic training with dual contrastive learning approach for colon segmentation.

Int J Comput Assist Radiol Surg

January 2025

Computer Vision and Image Processing Lab., UofL, Louisville, KY, 40292, USA.

Purpose: This article introduces a novel deep learning approach to substantially improve the accuracy of colon segmentation even with limited data annotation, which enhances the overall effectiveness of the CT colonography pipeline in clinical settings.

Methods: The proposed approach integrates 3D contextual information via guided sequential episodic training in which a query CT slice is segmented by exploiting its previous labeled CT slice (i.e.

View Article and Find Full Text PDF

This paper presents the design and implementation of a deep-learning-based observer for accurately estimating the State of Charge (SoC) of a vanadium flow battery. The novelty of the proposal lies in its direct use of terminal voltage and the application of a machine learning algorithm to model the battery's overpotentials, leading to greater accuracy and reduced complexity compared to classical models. The overpotentials model consists of a neural network trained using data generated by a classical observer that estimates species concentration using a physical electrochemical model and the open-circuit voltage measurement.

View Article and Find Full Text PDF

The accurate identification of protein-nucleotide binding residues is crucial for protein function annotation and drug discovery. Numerous computational methods have been proposed to predict these binding residues, achieving remarkable performance. However, due to the limited availability and high variability of nucleotides, predicting binding residues for diverse nucleotides remains a significant challenge.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!