Motivation: Protein structure can be severely disrupted by frameshift and non-sense mutations at specific positions in the protein sequence. Frameshift and non-sense mutation cases can also be found in healthy individuals. A method to distinguish neutral and potentially disease-associated frameshift and non-sense mutations is of practical and fundamental importance. It would allow researchers to rapidly screen out the potentially pathogenic sites from a large number of mutated genes and then use these sites as drug targets to speed up diagnosis and improve access to treatment. The problem of how to distinguish between neutral and potentially disease-associated frameshift and non-sense mutations remains under-researched.
Results: We built a Transformer-based neural network model to predict the pathogenicity of frameshift and non-sense mutations on protein features and named it TransPPMP. The feature matrix of contextual sequences computed by the ESM pre-training model, type of mutation residue and the auxiliary features, including structure and function information, are combined as input features, and the focal loss function is designed to solve the sample imbalance problem during the training. In 10-fold cross-validation and independent blind test set, TransPPMP showed good robust performance and absolute advantages in all evaluation metrics compared with four other advanced methods, namely, ENTPRISE-X, VEST-indel, DDIG-in and CADD. In addition, we demonstrate the usefulness of the multi-head attention mechanism in Transformer to predict the pathogenicity of mutations-not only can multiple self-attention heads learn local and global interactions but also functional sites with a large influence on the mutated residue can be captured by attention focus. These could offer useful clues to study the pathogenicity mechanism of human complex diseases for which traditional machine learning methods fall short.
Availability And Implementation: TransPPMP is available at https://github.com/lennylv/TransPPMP.
Supplementary Information: Supplementary data are available at Bioinformatics online.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1093/bioinformatics/btac188 | DOI Listing |
Int J Mol Sci
October 2024
Medical and Laboratory Genetics Unit, A.O.R.N. "Antonio Cardarelli", 80131 Naples, Italy.
Ital J Dermatol Venerol
August 2024
Unit of Dermatology, Department of Internal Medicine and Medical Specialties, Sapienza University, Rome, Italy.
Hum Mol Genet
July 2024
Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74, Brussels 1200, Belgium.
Developmental and functional defects in the lymphatic system are responsible for primary lymphoedema (PL). PL is a chronic debilitating disease caused by increased accumulation of interstitial fluid, predisposing to inflammation, infections and fibrosis. There is no cure, only symptomatic treatment is available.
View Article and Find Full Text PDFAm J Med Genet A
May 2024
Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana, USA.
MBTPS1 (NM_003791.4) encodes Site-1 protease, a serine protease that functions sequentially with Site-2 protease regulating cholesterol homeostasis and endoplasmic reticulum stress response. MBTPS1 pathogenic variants are associated with spondyloepiphyseal dysplasia, Kondo-Fu type (MIM:618392; cataract, alopecia, oral mucosal disorder, and psoriasis-like syndrome, and Silver-Russell-like syndrome).
View Article and Find Full Text PDFFront Microbiol
November 2023
Department of Microbiology and Cell Biology, Indian Institute of Science, Bengaluru, India.
The lumpy skin disease virus (LSDV), which mostly affects ruminants and causes huge-economic loss, was endemic in Africa, caused outbreaks in the Middle East, and was recently detected in Russia, Serbia, Greece, Bulgaria, Kazakhstan, China, Taiwan, Vietnam, Thailand, and India. However, the role of evolutionary drivers such as codon selection, negative/purifying selection, APOBEC editing, and genetic variations such as frameshift and in-frame nonsense mutations in the LSDVs, which cause outbreaks in cattle in various countries, are still largely unknown. In the present study, a frameshift mutation in LSDV035, LSDV019, LSDV134, and LSDV144 genes and in-frame non-sense mutations in LSDV026, LSDV086, LSDV087, LSDV114, LSDV130, LSDV131, LSDV145, LSDV154, LSDV155, LSDV057, and LSDV081 genes were revealed among different clusters.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!