Background: Protein-protein interaction (PPI) is very important for many biochemical processes. Therefore, accurate prediction of PPI can help us better understand the role of proteins in biochemical processes. Although there are many methods to predict PPI in biology, they are time-consuming and lack accuracy, so it is necessary to build an efficiently and accurately computational model in the field of PPI prediction.

Results: We present a novel sequence-based computational approach called DCSE (Double-Channel-Siamese-Ensemble) to predict potential PPI. In the encoding layer, we treat each amino acid as a word, and map it into an N-dimensional vector. In the feature extraction layer, we extract features from local and global perspectives by Multilayer Convolutional Neural Network (MCN) and Multilayer Bidirectional Gated Recurrent Unit with Convolutional Neural Networks (MBC). Finally, the output of the feature extraction layer is then fed into the prediction layer to output whether the input protein pair will interact each other. The MCN and MBC are siamese and ensemble based network, which can effectively improve the performance of the model. In order to demonstrate our model's performance, we compare it with four machine learning based and three deep learning based models. The results show that our method outperforms other models in all evaluation criteria. The Accuracy, Precision, [Formula: see text], Recall and MCC of our model are 0.9303, 0.9091, 0.9268, 0.9452, 0.8609. For the other seven models, the highest Accuracy, Precision, [Formula: see text], Recall and MCC are 0.9288, 0.9243, 0.9246, 0.9250, 0.8572. We also test our model in the imbalanced dataset and transfer our model to another species. The results show our model is excellent.

Conclusion: Our model achieves the best performance by comparing it with seven other models. NLP-based coding method has a good effect on PPI prediction task. MCN and MBC extract protein sequence features from local and global perspectives and these two feature extraction layers are based on siamese and ensemble network structures. Siamese-based network structure can keep the features consistent and ensemble based network structure can effectively improve the accuracy of the model.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9351149PMC
http://dx.doi.org/10.1186/s12864-022-08772-6DOI Listing

Publication Analysis

Top Keywords

feature extraction
12
biochemical processes
8
model
8
extraction layer
8
features local
8
local global
8
global perspectives
8
convolutional neural
8
mcn mbc
8
siamese ensemble
8

Similar Publications

Objective: Despite the identification of various prognostic factors for anaplastic thyroid carcinoma (ATC) patients over the years, a precise prognostic tool for these patients is still lacking. This study aimed to develop and validate a prognostic model for predicting survival outcomes for ATC patients using random survival forests (RSF), a machine learning algorithm.

Methods: A total of 1222 ATC patients were extracted from the Surveillance, Epidemiology, and End Results (SEER) database and randomly divided into a training set of 855 patients and a validation set of 367 patients.

View Article and Find Full Text PDF

Brain tumors can cause difficulties in normal brain function and are capable of developing in various regions of the brain. Malignant tumours can develop quickly, pass through neighboring tissues, and extend to further brain regions or the central nervous system. In contrast, healthy tumors typically develop slowly and do not invade surrounding tissues.

View Article and Find Full Text PDF

Infrared absorption spectroscopy and surface-enhanced Raman spectroscopy were integrated into three data fusion strategies-hybrid (concatenated spectra), mid-level (extracted features from both datasets) and high-level (fusion of predictions from both models)-to enhance the predictive accuracy for xylazine detection in illicit opioid samples. Three chemometric approaches-random forest, support vector machine, and -nearest neighbor algorithms-were employed and optimized using a 5-fold cross-validation grid search for all fusion strategies. Validation results identified the random forest classifier as the optimal model for all fusion strategies, achieving high sensitivity (88% for hybrid, 92% for mid-level, and 96% for high-level) and specificity (88% for hybrid, mid-level, and high-level).

View Article and Find Full Text PDF

Introduction: Datura stramonium (DS) possesses strong medicinal and therapeutic potential but has been rarely evaluated in this context.

Methods: The present study was intended to evaluate the antioxidant, hepatoprotective, and nephroprotective potential of the crude methanolic leaf extract and ethyl acetate, chloroform, n-hexane, and aqueous fractions of DS in paracetamol-intoxicated rabbits. Paracetamol (2 g/Kg BW) was applied to induce liver and kidney injury in rabbits while the methanolic extract and fractions of DS were applied in the dose range of 150 mg/Kg to 300 mg/Kg body weight for 21 days.

View Article and Find Full Text PDF

The challenges of pollution and agro-industrial waste management have led to the development of bioconversion techniques to transform these wastes into valuable products. This has increased the focus on the sustainable and cost-efficient production of biosurfactants from agro-industrial waste. Hence, the present study investigates the production of sophorolipid biosurfactants using the yeast strain IIPL32 under submerged fermentation, employing sugarcane bagasse hydrolysate-a renewable, low-cost agro-industrial waste as the feedstock.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!