Motivation: Protein structure can be severely disrupted by frameshift and non-sense mutations at specific positions in the protein sequence. Frameshift and non-sense mutation cases can also be found in healthy individuals. A method to distinguish neutral and potentially disease-associated frameshift and non-sense mutations is of practical and fundamental importance. It would allow researchers to rapidly screen out the potentially pathogenic sites from a large number of mutated genes and then use these sites as drug targets to speed up diagnosis and improve access to treatment. The problem of how to distinguish between neutral and potentially disease-associated frameshift and non-sense mutations remains under-researched.

Results: We built a Transformer-based neural network model to predict the pathogenicity of frameshift and non-sense mutations on protein features and named it TransPPMP. The feature matrix of contextual sequences computed by the ESM pre-training model, type of mutation residue and the auxiliary features, including structure and function information, are combined as input features, and the focal loss function is designed to solve the sample imbalance problem during the training. In 10-fold cross-validation and independent blind test set, TransPPMP showed good robust performance and absolute advantages in all evaluation metrics compared with four other advanced methods, namely, ENTPRISE-X, VEST-indel, DDIG-in and CADD. In addition, we demonstrate the usefulness of the multi-head attention mechanism in Transformer to predict the pathogenicity of mutations-not only can multiple self-attention heads learn local and global interactions but also functional sites with a large influence on the mutated residue can be captured by attention focus. These could offer useful clues to study the pathogenicity mechanism of human complex diseases for which traditional machine learning methods fall short.

Availability And Implementation: TransPPMP is available at https://github.com/lennylv/TransPPMP.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btac188DOI Listing

Publication Analysis

Top Keywords

frameshift non-sense
24
non-sense mutations
20
pathogenicity frameshift
8
protein features
8
distinguish neutral
8
neutral disease-associated
8
disease-associated frameshift
8
sites large
8
predict pathogenicity
8
frameshift
6

Similar Publications

Genotype-Phenotype Correlation of GNAS Gene: Review and Disease Management of a Hotspot Mutation.

Int J Mol Sci

October 2024

Medical and Laboratory Genetics Unit, A.O.R.N. "Antonio Cardarelli", 80131 Naples, Italy.

Article Synopsis
  • Defects in a specific gene are primarily linked to pseudohypoparathyroidism Ia (PHP1a), with various mutation types identified across all 13 exons.
  • A noteworthy mutation, a 4 bp deletion c.565_568delGACT, is recognized as a mutation hotspot, though focused studies on this variant have been limited.
  • The authors reported two PHP1a cases related to this deletion and found that patients with the c.565_568delGACT mutation exhibited a higher prevalence of certain characteristics—like brachydactyly and intellectual disability—compared to those with other mutations, suggesting a need for tailored patient monitoring.
View Article and Find Full Text PDF
Article Synopsis
  • Pseudoxanthoma elasticum (PXE) is a rare genetic disease that causes damage to elastic fibers in soft connective tissues, primarily affecting skin and eyes, and is inherited in an autosomal recessive pattern.
  • The study analyzed data from 86 PXE patients in Italy, revealing various genetic mutations and significant cutaneous and ocular symptoms, including skin changes and vision impairment, with additional issues like high blood pressure and liver disease noted.
  • Understanding the characteristics of PXE can help improve patient care and guide the development of better treatment options for those affected by the condition.
View Article and Find Full Text PDF

Developmental and functional defects in the lymphatic system are responsible for primary lymphoedema (PL). PL is a chronic debilitating disease caused by increased accumulation of interstitial fluid, predisposing to inflammation, infections and fibrosis. There is no cure, only symptomatic treatment is available.

View Article and Find Full Text PDF

MBTPS1 (NM_003791.4) encodes Site-1 protease, a serine protease that functions sequentially with Site-2 protease regulating cholesterol homeostasis and endoplasmic reticulum stress response. MBTPS1 pathogenic variants are associated with spondyloepiphyseal dysplasia, Kondo-Fu type (MIM:618392; cataract, alopecia, oral mucosal disorder, and psoriasis-like syndrome, and Silver-Russell-like syndrome).

View Article and Find Full Text PDF

The lumpy skin disease virus (LSDV), which mostly affects ruminants and causes huge-economic loss, was endemic in Africa, caused outbreaks in the Middle East, and was recently detected in Russia, Serbia, Greece, Bulgaria, Kazakhstan, China, Taiwan, Vietnam, Thailand, and India. However, the role of evolutionary drivers such as codon selection, negative/purifying selection, APOBEC editing, and genetic variations such as frameshift and in-frame nonsense mutations in the LSDVs, which cause outbreaks in cattle in various countries, are still largely unknown. In the present study, a frameshift mutation in LSDV035, LSDV019, LSDV134, and LSDV144 genes and in-frame non-sense mutations in LSDV026, LSDV086, LSDV087, LSDV114, LSDV130, LSDV131, LSDV145, LSDV154, LSDV155, LSDV057, and LSDV081 genes were revealed among different clusters.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!