Discovering nuclear targeting signal sequence through protein language learning and multivariate analysis.

Anal Biochem

Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China; Department of Computer Science, Shanghai Jiao Tong University, Shanghai, 200240, China. Electronic address:

Published: February 2020

Nuclear localization signals (NLSs) are peptides that target proteins to the nucleus by binding to carrier proteins in the cytoplasm that transport their cargo across the nuclear membrane. Accurate identification of NLSs can help elucidate the functions of nuclear protein complexes. The currently known NLS predictors are usually specific to certain species or largely dependent on prior knowledge of NLS basic residues. Thus, a more general predictor is highly desired to reduce the potentially high false positives or false negatives in discovering new NLSs. Here, we report a new method, INSP (Identification Nucleus Signal Peptide), to effectively identify NLS mainly based on statistical knowledge and machine learning algorithms. In our NLS machine learning model, we considered the query protein sequence as text and extracted the sequence context features using a natural language model. These word-vector features encode discriminative knowledge of NLS motif frequency and are thus useful for model recognition. The output of the machine learning model will be fused with statistical knowledge of the query sequence to build a final multivariate regression model for NLS peptide identification. The experimental results demonstrate a promising performance of the new INSP approach. INSP is freely available at: www.csbio.sjtu.edu.cn/bioinf/INSP/for academic use.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ab.2019.113565DOI Listing

Publication Analysis

Top Keywords

machine learning
12
knowledge nls
8
statistical knowledge
8
learning model
8
nls
6
model
5
discovering nuclear
4
nuclear targeting
4
targeting signal
4
sequence
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!