Cell-penetrating peptides (CPPs) are functional short peptides with high carrying capacity. CPP sequences with targeting functions for the highly efficient delivery of drugs to target cells. In this paper, which is focused on the prediction of the cargo category of CPPs, a biocomputational model is constructed to efficiently distinguish the category of cargo carried by CPPs as macromolecular carriers among the seven known deliverable cargo categories. Based on dipeptide composition (DipC), an improved feature representation method, general dipeptide composition (G-DipC) is proposed for short peptide sequences and can effectively increase the abundance of features represented. Then linear discriminant analysis (LDA) is applied to mine some important low-dimensional features of G-DipC and a predictive model is built with the XGBoost algorithm. Experimental results with five-fold cross validation show that G-DipC improves accuracy by 25 and 5 percent compared with amino acid composition (AAC) and DipC, respectively. G-DipC is even found to be better than tripeptide composition (TipC). Thus, the proposed model provides a novel resource for the study of cell-penetrating peptides, and the improved dipeptide composition G-DipC can be widely adapted to determine the feature representation of other biological sequences.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCBB.2019.2930993DOI Listing

Publication Analysis

Top Keywords

feature representation
12
cell-penetrating peptides
12
dipeptide composition
12
improved feature
8
representation method
8
composition g-dipc
8
g-dipc
6
composition
5
g-dipc improved
4
method short
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!