Predicting C- and S-linked Glycosylation sites from protein sequences using protein language models.

Comput Biol Med

Department of CSE, BUET, Dhaka 1000, Bangladesh. Electronic address:

Published: March 2025

Among various post-translational modifications (PTMs), predicting C-linked and S-linked glycosites is an essential task, yet experimental techniques such as Capillary Electrophoresis (CE), Enzymatic Deglycosylation, and Mass Spectrometry (MS) are expensive. Therefore, computational techniques are required to predict these glycosites. Here, different language model embeddings and sequential features were explored. Two separate feature selection methods: Recursive Feature Elimination (RFE) and Particle Swarm Optimization (PSO) were employed and utilized for identifying the optimal feature set. Cross-validation results were generated for choosing the final models. Three sampling strategies to handle imbalanced datasets were examined: Random undersampling, Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic Sampling Approach for Imbalanced Learning (ADASYN). In this study, two models: DeepCSEmbed-C and DeepCSEmbed-S are proposed for C-linked and S-linked glycosylation prediction respectively. DeepCSEmbed-C is a dual-branch deep learning model comprising a Feedforward Neural Network (FNN) branch and an Inception branch, coupled with a Random undersampling strategy. DeepCSEmbed-S is a Categorical Boosting (CAT) model with the SMOTE oversampling strategy. DeepCSEmbed-C outperformed available state-of-the-art (SOTA) methods, achieving 92.9% sensitivity, 95.1% F1-score and 90.6% MCC on the Independent dataset. Datasets and python scripts for training and testing the models are provided and made freely accessible at https://github.com/nafcoder/DeepCSEmbed.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.compbiomed.2025.109956DOI Listing

Publication Analysis

Top Keywords

s-linked glycosylation
8
c-linked s-linked
8
random undersampling
8
predicting s-linked
4
glycosylation sites
4
sites protein
4
protein sequences
4
sequences protein
4
protein language
4
models
4

Similar Publications

Predicting C- and S-linked Glycosylation sites from protein sequences using protein language models.

Comput Biol Med

March 2025

Department of CSE, BUET, Dhaka 1000, Bangladesh. Electronic address:

Among various post-translational modifications (PTMs), predicting C-linked and S-linked glycosites is an essential task, yet experimental techniques such as Capillary Electrophoresis (CE), Enzymatic Deglycosylation, and Mass Spectrometry (MS) are expensive. Therefore, computational techniques are required to predict these glycosites. Here, different language model embeddings and sequential features were explored.

View Article and Find Full Text PDF

Synthesis of -Glycoside Building Blocks as Mimetics of the Repeating d-GlcN-α-1,4-d-GlcA Heparan Sulfate Disaccharide.

Molecules

December 2024

School of Chemical and Physical Sciences and Centre for Glycoscience, Keele University, Keele, Staffordshire ST5 5BG, UK.

Heparan sulfate (HS), a sulfated linear carbohydrate that decorates the cell surface and extracellular matrix, is a key regulator of biological processes. Owing to the inherent structural complexity of HS, structure-to-function studies with its ligands are required, and materials to improve the understanding of such interactions are therefore of high importance. Herein, the synthesis of novel -linked GlcN-α(1→4)-GlcA disaccharide building blocks is detailed.

View Article and Find Full Text PDF

The lack of catalytic stereoselective approaches for producing 1,2--furanosides emphasizes the critical need for further research in this area. Herein, we present a stereoselective -furanosylation method, utilizing a 4,7-dipiperidine-substituted phenanthroline catalyst. This developed protocol fills a gap in the field, enabling the coupling of cysteine residues and thiols with furanosyl bromide electrophiles.

View Article and Find Full Text PDF

GLYCOCINS: The sugar peppered antimicrobials.

Biotechnol Adv

October 2024

CSIR-Institute of Microbial Technology, Sector 39A, Chandigarh 160036, India; Academy of Scientific and Innovation Research (AcSIR), Ghaziabad 201002, India; Current address: Food Safety and Standards Authority of India (FSSAI), New Delhi 110002, India. Electronic address:

Glycosylated bacteriocins, known as glycocins, were first discovered in 2011. These bioactive peptides are produced by bacteria to gain survival advantages. They exhibit diverse types of glycans and demonstrate varied antimicrobial activity.

View Article and Find Full Text PDF

Fidaxomicin (Fdx) constitutes a glycosylated natural product with excellent antibacterial activity against various Gram-positive bacteria but is approved only for infections. Poor water solubility and acid lability preclude its use for other infections. Herein, we describe our strategy to overcome the acid lability by introducing acid-stable S-linked glycosides.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!