Proteochemometric modeling in a Bayesian framework.

Isidro Cortes-Ciriano Gerard Jp van Westen Eelke Bart Lenselink Daniel S Murrell Andreas Bender Thérèse Malliavin

J Cheminform

Institut Pasteur, Unité de Bioinformatique Structurale; CNRS UMR 3825; Département de Biologie Structurale et Chimie.

Published: July 2014

Proteochemometrics (PCM) is an approach for bioactivity predictive modeling which models the relationship between protein and chemical information. Gaussian Processes (GP), based on Bayesian inference, provide the most objective estimation of the uncertainty of the predictions, thus permitting the evaluation of the applicability domain (AD) of the model. Furthermore, the experimental error on bioactivity measurements can be used as input for this probabilistic model. In this study, we apply GP implemented with a panel of kernels on three various (and multispecies) PCM datasets. The first dataset consisted of information from 8 human and rat adenosine receptors with 10,999 small molecule ligands and their binding affinity. The second consisted of the catalytic activity of four dengue virus NS3 proteases on 56 small peptides. Finally, we have gathered bioactivity information of small molecule ligands on 91 aminergic GPCRs from 9 different species, leading to a dataset of 24,593 datapoints with a matrix completeness of only 2.43%. GP models trained on these datasets are statistically sound, at the same level of statistical significance as Support Vector Machines (SVM), with [Formula: see text] values on the external dataset ranging from 0.68 to 0.92, and RMSEP values close to the experimental error. Furthermore, the best GP models obtained with the normalized polynomial and radial kernels provide intervals of confidence for the predictions in agreement with the cumulative Gaussian distribution. GP models were also interpreted on the basis of individual targets and of ligand descriptors. In the dengue dataset, the model interpretation in terms of the amino-acid positions in the tetra-peptide ligands gave biologically meaningful results.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4083135	PMC
http://dx.doi.org/10.1186/1758-2946-6-35	DOI Listing

Publication Analysis

Top Keywords

experimental error

small molecule

molecule ligands

proteochemometric modeling

modeling bayesian

bayesian framework

framework proteochemometrics

proteochemometrics pcm

pcm approach

approach bioactivity

Similar Publications

Imputation for Lipidomics and Metabolomics (ImpLiMet): a web-based application for optimization and method selection for missing data imputation.

Bioinform Adv

January 2025

Digital Technologies Research Centre, National Research Council of Canada, Ottawa, ON K1K 4P7, Canada.

Huiting Ou Anuradha Surendra Graeme S V McDowell Emily Hashimoto-Roth Jianguo Xia

Motivation: Missing values are prevalent in high-throughput measurements due to various experimental or analytical reasons. Imputation, the process of replacing missing values in a dataset with estimated values, plays an important role in multivariate and machine learning analyses. The three missingness patterns, including missing completely at random, missing at random, and missing not at random, describe unique dependencies between the missing and observed data.

View Article and Find Full Text PDF

Similar Publications

Noise reduction in brain magnetic resonance imaging using adaptive wavelet thresholding based on linear prediction factor.

Front Neurosci

January 2025

Graduate Program in Electrical Engineering, Federal University of Pará - UFPA, Belém, Brazil.

Ananias Pereira Neto Fabrício J B Barros

Introduction: Wavelet thresholding techniques are crucial in mitigating noise in data communication and storage systems. In image processing, particularly in medical imaging like MRI, noise reduction is vital for improving visual quality and accurate analysis. While existing methods offer noise reduction, they often suffer from limitations like edge and texture loss, poor smoothness, and the need for manual parameter tuning.

View Article and Find Full Text PDF

Similar Publications

Optimizing droplet coalescence dynamics in microchannels: A comprehensive study using response surface methodology and machine learning algorithms.

Heliyon

January 2025

Institute for Nanomaterials, Advanced Technologies and Innovation, Technical University of Liberec, 46117, Liberec, Czech Republic.

Seyed Morteza Javadpour Erfan Kadivar Zienab Heidary Zarneh Ebrahim Kadivar Mohammad Gheibi

Droplet coalescence in microchannels is a complex phenomenon influenced by various parameters such as droplet size, velocity, liquid surface tension, and droplet-droplet spacing. In this study, we thoroughly investigate the impact of these control parameters on droplet coalescence dynamics within a sudden expansion microchannel using two distinct numerical methods. Initially, we employ the boundary element method to solve the Brinkman integral equation, providing detailed insights into the underlying physics of droplet coalescence.

View Article and Find Full Text PDF

Similar Publications

Development and applications of a machine learning model for an in-depth analysis of pentylenetetrazol-induced seizure-like behaviors in adult zebrafish.

Neuroscience

January 2025

Laboratory of Experimental Neuropsychobiology, Department of Biochemistry and Molecular Biology, Federal University of Santa Maria, Santa Maria, RS, Brazil; Graduate Program in Biological Sciences: Toxicological Biochemistry, Federal University of Santa Maria, Santa Maria, RS, Brazil; The International Zebrafish Neuroscience Research Consortium (ZNRC), Slidell, LA, United States. Electronic address:

Barbara D Fontana Laura Blanco Angela E Uchoa Mariana L Müller Falco L Gonçalves

Epilepsy, a neurological disorder causing recurring seizures, is often studied in zebrafish by exposing animals to pentylenetetrazol (PTZ), which induces clonic- and tonic-like behaviors. While adult zebrafish seizure-like behaviors are well characterized, manual assessment remains challenging due to its time-consuming nature, potential for human error/bias, and the risk of overlooking subtle behaviors. Aiming to circumvent these issues, we developed a machine learning model for automating the analysis of subtle abnormal and seizure-like behaviors in PTZ-exposed adult zebrafish.

View Article and Find Full Text PDF

Similar Publications

Blood-based epigenome-wide association study and prediction of alcohol consumption.

Clin Epigenetics

January 2025

Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK.

Elena Bernabeu Aleksandra D Chybowska Jacob K Kresovich Matthew Suderman Daniel L McCartney

Alcohol consumption is an important risk factor for multiple diseases. It is typically assessed via self-report, which is open to measurement error through recall bias. Instead, molecular data such as blood-based DNA methylation (DNAm) could be used to derive a more objective measure of alcohol consumption by incorporating information from cytosine-phosphate-guanine (CpG) sites known to be linked to the trait.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!