Motivation: The accurate prediction of O-GlcNAcylation sites is crucial for understanding disease mechanisms and developing effective treatments. Previous machine learning models primarily relied on primary or secondary protein structural and related properties, which have limitations in capturing the spatial interactions of neighboring amino acids. This study introduces local environmental features as a novel approach that incorporates three-dimensional spatial information, significantly improving model performance by considering the spatial context around the target site. Additionally, we utilize sparse recurrent neural networks to effectively capture sequential nature of the proteins and to identify key factors influencing O-GlcNAcylation as an explainable machine learning model.

Results: Our findings demonstrate the effectiveness of our proposed features with the model achieving an F1 score of 28.3%, as well as feature selection capability with the model using only the top 20% of features achieving the highest F1 score of 32.02%, a 1.4-fold improvement over existing PTM models. Statistical analysis of the top 20 features confirmed their consistency with literature. This method not only boosts prediction accuracy but also paves the way for further research in understanding and targeting O-GlcNAcylation.

Availability: The entire code, data, features used in this study are available in the GitHub repository: https://github.com/pseokyoung/o-glcnac-prediction.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaf034DOI Listing

Publication Analysis

Top Keywords

machine learning
12
explainable machine
8
features
5
enhanced o-glycosylation
4
o-glycosylation site
4
site prediction
4
prediction explainable
4
learning technique
4
spatial
4
technique spatial
4

Similar Publications

Exploring the role of oxidative stress in carotid atherosclerosis: insights from transcriptomic data and single-cell sequencing combined with machine learning.

Biol Direct

January 2025

National Key Laboratory for Innovation and Transformation of Luobing Theory; The Key Laboratory of Cardiovascular Remodeling and Function Research, Chinese Ministry of Education, Chinese National Health Commission and Chinese Academy of Medical Sciences, Jinan, China.

Background: Carotid atherosclerotic plaque is the primary cause of cardiovascular and cerebrovascular diseases. It is closely related to oxidative stress and immune inflammation. This bioinformatic study was conducted to identify key oxidative stress-related genes and key immune cell infiltration involved in the formation, progression, and stabilization of plaques and investigate the relationship between them.

View Article and Find Full Text PDF

Unveiling new therapeutic horizons in rheumatoid arthritis: an In-depth exploration of circular RNAs derived from plasma exosomes.

J Orthop Surg Res

January 2025

Department of Rheumatology and Immunology, Affiliated Hospital of Yangzhou University, Yangzhou University, No. 368 Hanjiang Middle Road, Yangzhou, Jiangsu, 225000, China.

Rheumatoid arthritis (RA), a chronic inflammatory joint disease causing permanent disability, involves exosomes, nanosized mammalian extracellular particles. Circular RNA (circRNA) serves as a biomarker in RA blood samples. This research screened differentially expressed circRNAs in RA patient plasma exosomes for novel diagnostic biomarkers.

View Article and Find Full Text PDF

AiGPro: a multi-tasks model for profiling of GPCRs for agonist and antagonist.

J Cheminform

January 2025

School of Systems Biomedical Science, Soongsil University, 369 Sangdo-ro, Dongjak-gu, 06978, Seoul, Republic of Korea.

G protein-coupled receptors (GPCRs) play vital roles in various physiological processes, making them attractive drug discovery targets. Meanwhile, deep learning techniques have revolutionized drug discovery by facilitating efficient tools for expediting the identification and optimization of ligands. However, existing models for the GPCRs often focus on single-target or a small subset of GPCRs or employ binary classification, constraining their applicability for high throughput virtual screening.

View Article and Find Full Text PDF

Detection of early relapse in multiple myeloma patients.

Cell Div

January 2025

Babak Myeloma Group, Department of Pathophysiology, Faculty of Medicine, Masaryk University, Brno, Czech Republic.

Background: Multiple myeloma (MM) represents the second most common hematological malignancy characterized by the infiltration of the bone marrow by plasma cells that produce monoclonal immunoglobulin. While the quality and length of life of MM patients have significantly increased, MM remains a hard-to-treat disease; almost all patients relapse. As MM is highly heterogenous, patients relapse at different times.

View Article and Find Full Text PDF

Background: Amebiasis represents a significant global health concern. This is especially evident in developing countries, where infections are more common. The primary diagnostic method in laboratories involves the microscopy of stool samples.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!