Variants of uncertain significance (VUS) represent variants that lack sufficient evidence to be confidently associated with a disease, thus posing a challenge in the interpretation of genetic testing results. Here we report an improved method for predicting the VUS of Arylsulfatase A (ARSA) gene as part of the Critical Assessment of Genome Interpretation challenge (CAGI6). Our method uses a transfer learning approach that leverages a pre-trained protein language model to predict the impact of mutations on the activity of the ARSA enzyme, whose deficiency is known to cause a rare genetic disorder, metachromatic leukodystrophy. Our innovative framework combines zero-shot log odds scores and embeddings from the ESM, an evolutionary scale model as features for training a supervised model on gene variants functionally related to the ARSA gene. The zero-shot log odds score feature captures the generic properties of the proteins learned due to its pre-training on millions of sequences in the UniProt data, while the ESM embeddings for the proteins in the ARSA family capture features specific to the family. We also tested our approach on another enzyme, N-acetyl-glucosaminidase (NAGLU), that belongs to the same superfamily as ARSA. Our results demonstrate that the performance of our family models (augmented ESM models) is either comparable or better than the ESM models. The ARSA model compares favorably with the majority of state-of-the-art predictors on area under precision and recall curve (AUPRC) performance metric. However, the NAGLU model outperforms all pathogenicity predictors evaluated in this study on AUPRC metric. The improved AUPRC has relevance in a diagnostic setting where variant prioritization generally entails identifying a small number of pathogenic variants from a larger number of benign variants. Our results also indicate that genes that have sparse or no experimental variant impact data, the family variant data can serve as a proxy training data for making accurate predictions. Attention analysis of active sites and binding sites in ARSA and NAGLU proteins shed light on probable mechanisms of pathogenicity for positions that are highly attended.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1007/s00439-025-02727-z | DOI Listing |
Nutr Metab (Lond)
January 2025
School of Basic Medical Sciences, Hubei University of Chinese Medicine, Wuhan, Hubei, 430065, China.
Background: This study aims to explore the interplay between body mass index (BMI), neutrophils, triglyceride levels, and uric acid (UA). Understanding the causal correlation between UA and health indicators, specifically its association with the body's inflammatory conditions, is crucial for preventing and managing various diseases.
Methods: A retrospective analysis was conducted on 4,286 cases utilizing the Spearman correlation method.
Background: Drivers of COVID-19 severity are multifactorial and include multidimensional and potentially interacting factors encompassing viral determinants and host-related factors (i.e., demographics, pre-existing conditions and/or genetics), thus complicating the prediction of clinical outcomes for different severe acute respiratory syndrome coronavirus (SARS-CoV-2) variants.
View Article and Find Full Text PDFBMC Genomics
January 2025
Cannabis Innovation and Research Center, Université de Moncton, Moncton, New-Brunswick, Canada.
Background: Due to its previously illicit nature, Cannabis sativa had not fully reaped the benefits of recent innovations in genomics and plant sciences. However, Canada's legalization of C. sativa and products derived from its flower in 2018 triggered significant new demand for robust genotyping tools to assist breeders in meeting consumer demands.
View Article and Find Full Text PDFBMC Genomics
January 2025
Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Jiefang Avenue 1095, Wuhan, Hubei, 430030, China.
Background: Left-right (LR) asymmetry disorders present a complex etiology, with genetic factors emerging as a primary contributor. This study aims to explore the genetic underpinnings of chromosomal variants and individual genes in fetuses afflicted with prenatal LR asymmetry disorder.
Methods: Through a retrospective analysis conducted between 2020 and 2023 at Tongji Hospital, Huazhong University of Science and Technology, genetic outcomes of LR asymmetric disorder were scrutinized utilizing copy number variation sequencing (CNV-seq) and whole exome sequencing (WES) methodologies.
J Int AIDS Soc
February 2025
AP-HP, Hôpital Bichat Claude Bernard, Service de Virologie, INSERM, IAME, Paris, France.
Introduction: Molecular surveillance is an important tool for detecting chains of transmission and controlling the HIV epidemic. This can also improve our knowledge of molecular and epidemiological factors for the optimization of prevention. Our objective was to illustrate this by studying the molecular and epidemiological evolution of the cluster including the new circulating recombinant form (CRF) 94_cpx of HIV-1, detected in 2017 and targeted by preventive actions in 2018.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!