Numerous computational prediction tools have been introduced to estimate the functional impact of variants in the human genome based on evolutionary constraints and biochemical metrics. However, their implementation in diagnostic settings to classify variants faced challenges with accuracy and validity. Most existing tools are pan-genome and pan-diseases, which neglected gene- and disease-specific properties and limited the accessibility of curated data. As a proof-of-concept, we developed a disease-specific prediction tool named Deafness Variant deleteriousness Prediction tool (DVPred) that focused on the 157 genes reportedly causing genetic hearing loss (HL). DVPred applied the gradient boosting decision tree (GBDT) algorithm to the dataset consisting of expert-curated pathogenic and benign variants from a large in-house HL patient cohort and public databases. With the incorporation of variant-level and gene-level features, DVPred outperformed the existing universal tools. It boasts an area under the curve (AUC) of 0.98, and showed consistent performance (AUC = 0.985) in an independent assessment dataset. We further demonstrated that multiple gene-level metrics, including low complexity genomic regions and substitution intolerance scores, were the top features of the model. A comprehensive analysis of missense variants showed a gene-specific ratio of predicted deleterious and neutral variants, implying varied tolerance or intolerance to variation in different genes. DVPred explored the utility of disease-specific strategy in improving the deafness variant prediction tool. It can improve the prioritization of pathogenic variants among massive variants identified by high-throughput sequencing on HL genes. It also shed light on the development of variant prediction tools for other genetic disorders.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1007/s00439-022-02440-1 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!