Machine Learning Framework for Conotoxin Class and Molecular Target Prediction.

Toxins (Basel)

Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.

Published: November 2024

Conotoxins are small and highly potent neurotoxic peptides derived from the venom of marine cone snails which have captured the interest of the scientific community due to their pharmacological potential. These toxins display significant sequence and structure diversity, which results in a wide range of specificities for several different ion channels and receptors. Despite the recognized importance of these compounds, our ability to determine their binding targets and toxicities remains a significant challenge. Predicting the target receptors of conotoxins, based solely on their amino acid sequence, remains a challenge due to the intricate relationships between structure, function, target specificity, and the significant conformational heterogeneity observed in conotoxins with the same primary sequence. We have previously demonstrated that the inclusion of post-translational modifications, collisional cross sections values, and other structural features, when added to the standard primary sequence features, improves the prediction accuracy of conotoxins against non-toxic and other toxic peptides across varied datasets and several different commonly used machine learning classifiers. Here, we present the effects of these features on conotoxin class and molecular target predictions, in particular, predicting conotoxins that bind to nicotinic acetylcholine receptors (nAChRs). We also demonstrate the use of the Synthetic Minority Oversampling Technique (SMOTE)-Tomek in balancing the datasets while simultaneously making the different classes more distinct by reducing the number of ambiguous samples which nearly overlap between the classes. In predicting the alpha, mu, and omega conotoxin classes, the SMOTE-Tomek PCA PLR model, using the combination of the SS and P feature sets establishes the best performance with an overall accuracy (OA) of 95.95%, with an average accuracy (AA) of 93.04%, and an f1 score of 0.959. Using this model, we obtained sensitivities of 98.98%, 89.66%, and 90.48% when predicting alpha, mu, and omega conotoxin classes, respectively. Similarly, in predicting conotoxins that bind to nAChRs, the SMOTE-Tomek PCA SVM model, which used the collisional cross sections (CCSs) and the P feature sets, demonstrated the highest performance with 91.3% OA, 91.32% AA, and an f1 score of 0.9131. The sensitivity when predicting conotoxins that bind to nAChRs is 91.46% with a 91.18% sensitivity when predicting conotoxins that do not bind to nAChRs.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11598409PMC
http://dx.doi.org/10.3390/toxins16110475DOI Listing

Publication Analysis

Top Keywords

predicting conotoxins
16
conotoxins bind
16
bind nachrs
12
machine learning
8
conotoxin class
8
class molecular
8
molecular target
8
conotoxins
8
remains challenge
8
primary sequence
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!