Measurements of protein-ligand interactions have reproducibility limits due to experimental errors. Any model based on such assays will consequentially have such unavoidable errors influencing their performance which should ideally be factored into modelling and output predictions, such as the actual standard deviation of experimental measurements (σ) or the associated comparability of activity values between the aggregated heterogenous activity units (i.e., K versus IC values) during dataset assimilation. However, experimental errors are usually a neglected aspect of model generation. In order to improve upon the current state-of-the-art, we herein present a novel approach toward predicting protein-ligand interactions using a Probabilistic Random Forest (PRF) classifier. The PRF algorithm was applied toward in silico protein target prediction across ~ 550 tasks from ChEMBL and PubChem. Predictions were evaluated by taking into account various scenarios of experimental standard deviations in both training and test sets and performance was assessed using fivefold stratified shuffled splits for validation. The largest benefit in incorporating the experimental deviation in PRF was observed for data points close to the binary threshold boundary, when such information was not considered in any way in the original RF algorithm. For example, in cases when σ ranged between 0.4-0.6 log units and when ideal probability estimates between 0.4-0.6, the PRF outperformed RF with a median absolute error margin of ~ 17%. In comparison, the baseline RF outperformed PRF for cases with high confidence to belong to the active class (far from the binary decision threshold), although the RF models gave errors smaller than the experimental uncertainty, which could indicate that they were overtrained and/or over-confident. Finally, the PRF models trained with putative inactives decreased the performance compared to PRF models without putative inactives and this could be because putative inactives were not assigned an experimental pXC value, and therefore they were considered inactives with a low uncertainty (which in practice might not be true). In conclusion, PRF can be useful for target prediction models in particular for data where class boundaries overlap with the measurement uncertainty, and where a substantial part of the training data is located close to the classification threshold.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8375213 | PMC |
http://dx.doi.org/10.1186/s13321-021-00539-7 | DOI Listing |
Genes (Basel)
December 2024
Department of Genetics and Biotechnology, Ivan Franko National University of Lviv, 79005 Lviv, Ukraine.
Background: Glycopeptide antibiotics (GPAs) are a very successful class of clinically relevant antibacterials, used to treat severe infections caused by Gram-positive pathogens, e.g., multidrug resistant and methicillin-resistant staphylococci.
View Article and Find Full Text PDFInt J Mol Sci
November 2024
Institute for Cellular and Molecular Immunology, University Medical Center Göttingen, 37073 Göttingen, Germany.
Teratomas are a highly differentiated type of testicular germ cell tumors (TGCTs), the most common type of solid cancer in young men. Prominent inflammatory infiltrates are a hallmark of TGCTs, although their compositions and dynamics in teratomas remain elusive. Here, we reached out to characterize the infiltrating immune cells and their activation and polarization state by using high-throughput gene expression analysis of 129.
View Article and Find Full Text PDFFEBS Lett
January 2025
Graduate School of Agriculture and Life Sciences, The University of Tokyo, Bunkyo-ku, Japan.
Serratia sp. ATCC 39006 has two tandemly positioned genes, ser4 and ser5, both annotated as sugar aminotransferases, in a putative secondary metabolite biosynthetic gene cluster. Ser5 possesses a complete fold-type I aminotransferase fold, while Ser4 lacks the N- and C-terminal regions and a catalytically important lysine residue of fold-type I aminotransferase.
View Article and Find Full Text PDFBMC Biol
November 2024
Department of Ecology, Evolution and Marine Biology, University of California Santa Barbara, Santa Barbara, USA.
Pharmacol Res
December 2024
Department of Biochemistry and Pharmacology, The University of Melbourne, VIC, Australia; ARC Centre for Personalised Therapeutics Technologies, Melbourne, VIC, Australia. Electronic address:
Many drugs have been discontinued during phase II/III breast cancer clinical trials due to lack of clinical efficacy, indicating shortcomings in predictive value of preclinical data. Nutrient availability in the tumour cell microenvironment and the dimensionality of in vitro tumour cells likely impact on drug responsiveness. Global proteomics experiments were conducted to assess the impact of nutrient availability and dimensionality of culture.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!