Proteins commonly perform biological functions through protein-protein interactions (PPIs). The knowledge of PPI sites is imperative for the understanding of protein functions, disease mechanisms, and drug design. Traditional biological experimental methods for studying PPI sites still incur considerable drawbacks, including long experimental time and high labor costs. Therefore, many computational methods have been proposed for predicting PPI sites. However, achieving high prediction performance and overcoming severe data imbalance remain challenging issues. In this paper, we propose a new sequence-based deep learning model called CLPPIS (standing for CNN-LSTM ensemble based PPI Sites prediction). CLPPIS consists of CNN and LSTM components, which can capture spatial features and sequential features simultaneously. Further, it utilizes a novel feature group as input, which has 7 physicochemical, biophysical, and statistical properties. Besides, it adopts a batch-weighted loss function to reduce the interference of imbalance data. Our work suggests that the integration of protein spatial features and sequential features provides important information for PPI sites prediction. Evaluation on three public benchmark datasets shows that our CLPPIS model significantly outperforms existing state-of-the-art methods.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TCBB.2023.3306948 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!