Machine learning models for accurate prioritization of variants of uncertain significance.

Daniel Mahecha Haydemar Nuñez Maria C Lattig Jorge Duitama

Hum Mutat

Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia.

Published: April 2022

The growing use of next-generation sequencing technologies on genetic diagnosis has produced an exponential increase in the number of variants of uncertain significance (VUS). In this manuscript, we compare three machine learning methods to classify VUS as Pathogenic or No pathogenic, implementing a Random Forest (RF), a Support Vector Machine (SVM), and a Multilayer Perceptron. To train the models, we extracted high-quality variants from ClinVar that were previously classified as VUS. For each variant, we retrieved nine conservation scores, the loss-of-function tool, and allele frequencies. For the RF and SVM models, hyperparameters were tuned using cross-validation with a grid search. The three models were tested on a nonoverlapping set of variants that had been classified as VUS over the last 3 years, but had been reclassified in August 2020. The three models yielded superior accuracy on this set compared to the benchmarked tools. The RF-based model yielded the best performance across different variant types and was used to create VusPrize, an open-source software tool for prioritization of VUS. We believe that our model can improve the process of genetic diagnosis in research and clinical settings.

Download full-text PDF	Source
http://dx.doi.org/10.1002/humu.24339	DOI Listing

Publication Analysis

Top Keywords

machine learning

variants uncertain

uncertain significance

genetic diagnosis

classified vus

three models

models

vus

learning models

models accurate

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!