Use of tree-based machine learning methods to screen affinitive peptides based on docking data.

Mol Inform

Henan Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou, China.

Published: December 2023

Screening peptides with good affinity is an important step in peptide-drug discovery. Recent advancement in computer and data science have made machine learning a useful tool in accurately affinitive-peptide screening. In current study, four different tree-based algorithms, including Classification and regression trees (CART), C5.0 decision tree (C50), Bagged CART (BAG) and Random Forest (RF), were employed to explore the relationship between experimental peptide affinities and virtual docking data, and the performance of each model was also compared in parallel. All four algorithms showed better performances on dataset pre-scaled, -centered and -PCA than other pre-processed dataset. After model re-built and hyperparameter optimization, the optimal C50 model (C50O) showed the best performances in terms of Accuracy, Kappa, Sensitivity, Specificity, F1, MCC and AUC when validated on test data and an unknown PEDV datasets evaluation (Accuracy=80.4 %). BAG and RFO (the optimal RF), as two best models during training process, did not performed as expecting during in testing and unknown dataset validations. Furthermore, the high correlation of the predictions of RFO and BAG to C50O implied the high stability and robustness of their prediction. Whereas although the good performance on unknown dataset, the poor performance in test data validation and correlation analysis indicated CARTO could not be used for future data prediction. To accurately evaluate the peptide affinity, the current study firstly gave a tree-model competition on affinitive peptide prediction by using virtual docking data, which would expand the application of machine learning algorithms in studying PepPIs and benefit the development of peptide therapeutics.

Download full-text PDF

Source
http://dx.doi.org/10.1002/minf.202300143DOI Listing

Publication Analysis

Top Keywords

machine learning
12
docking data
12
current study
8
virtual docking
8
test data
8
unknown dataset
8
data
7
tree-based machine
4
learning methods
4
methods screen
4

Similar Publications

Comprehensive benchmarking of computational tools for predicting toxicokinetic and physicochemical properties of chemicals.

J Cheminform

December 2024

Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy.

Ensuring the safety of chemicals for environmental and human health involves assessing physicochemical (PC) and toxicokinetic (TK) properties, which are crucial for absorption, distribution, metabolism, excretion, and toxicity (ADMET). Computational methods play a vital role in predicting these properties, given the current trends in reducing experimental approaches, especially those that involve animal experimentation. In the present manuscript, twelve software tools implementing Quantitative Structure-Activity Relationship (QSAR) models were selected for the prediction of 17 relevant PC and TK properties.

View Article and Find Full Text PDF

Deep learning-based metabolomics data study of prostate cancer.

BMC Bioinformatics

December 2024

College of Computer Science and Technology, Inner Mongolia Minzu University, Tongliao, 028000, China.

As a heterogeneous disease, prostate cancer (PCa) exhibits diverse clinical and biological features, which pose significant challenges for early diagnosis and treatment. Metabolomics offers promising new approaches for early diagnosis, treatment, and prognosis of PCa. However, metabolomics data are characterized by high dimensionality, noise, variability, and small sample sizes, presenting substantial challenges for classification.

View Article and Find Full Text PDF

Background: Accurate prediction of pathological complete response (pCR) and disease-free survival (DFS) in locally advanced rectal cancer (LARC) patients undergoing neoadjuvant chemoradiotherapy (NCRT) is essential for formulating effective treatment plans. This study aimed to construct and validate the machine learning (ML) models to predict pCR and DFS using pathomics.

Method: A retrospective analysis was conducted on 294 patients who received NCRT from two independent institutions.

View Article and Find Full Text PDF

Introduction: Vascular access (VA) is essential for patients with hemodialysis, and its dysfunction is a major complication that can reduce quality of life or even threaten life. VA patency is not only difficult to predict on an individual basis, but also challenging to predict in real-time. To overcome this challenge, this study aimed to develop a machine learning approach to predict 6-month primary patency (PP) using photoplethysmography (PPG) signals acquired from the tips of both index fingers.

View Article and Find Full Text PDF

Background: Eye-movement can reflect cognition and provide information on the neurodegeneration, such as Alzheimer's disease (AD). The high cost and limited accessibility of eye-movement recordings have hindered their use in clinics.

Aims: We aim to develop an AI-driven eye-tracking tool for assessing AD using mobile devices with embedded cameras.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!