Proteochemometric modeling in a Bayesian framework.

J Cheminform

Institut Pasteur, Unité de Bioinformatique Structurale; CNRS UMR 3825; Département de Biologie Structurale et Chimie.

Published: July 2014

Proteochemometrics (PCM) is an approach for bioactivity predictive modeling which models the relationship between protein and chemical information. Gaussian Processes (GP), based on Bayesian inference, provide the most objective estimation of the uncertainty of the predictions, thus permitting the evaluation of the applicability domain (AD) of the model. Furthermore, the experimental error on bioactivity measurements can be used as input for this probabilistic model. In this study, we apply GP implemented with a panel of kernels on three various (and multispecies) PCM datasets. The first dataset consisted of information from 8 human and rat adenosine receptors with 10,999 small molecule ligands and their binding affinity. The second consisted of the catalytic activity of four dengue virus NS3 proteases on 56 small peptides. Finally, we have gathered bioactivity information of small molecule ligands on 91 aminergic GPCRs from 9 different species, leading to a dataset of 24,593 datapoints with a matrix completeness of only 2.43%. GP models trained on these datasets are statistically sound, at the same level of statistical significance as Support Vector Machines (SVM), with [Formula: see text] values on the external dataset ranging from 0.68 to 0.92, and RMSEP values close to the experimental error. Furthermore, the best GP models obtained with the normalized polynomial and radial kernels provide intervals of confidence for the predictions in agreement with the cumulative Gaussian distribution. GP models were also interpreted on the basis of individual targets and of ligand descriptors. In the dengue dataset, the model interpretation in terms of the amino-acid positions in the tetra-peptide ligands gave biologically meaningful results.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4083135PMC
http://dx.doi.org/10.1186/1758-2946-6-35DOI Listing

Publication Analysis

Top Keywords

experimental error
8
small molecule
8
molecule ligands
8
proteochemometric modeling
4
modeling bayesian
4
bayesian framework
4
framework proteochemometrics
4
proteochemometrics pcm
4
pcm approach
4
approach bioactivity
4

Similar Publications

Motivation: Missing values are prevalent in high-throughput measurements due to various experimental or analytical reasons. Imputation, the process of replacing missing values in a dataset with estimated values, plays an important role in multivariate and machine learning analyses. The three missingness patterns, including missing completely at random, missing at random, and missing not at random, describe unique dependencies between the missing and observed data.

View Article and Find Full Text PDF

Introduction: Wavelet thresholding techniques are crucial in mitigating noise in data communication and storage systems. In image processing, particularly in medical imaging like MRI, noise reduction is vital for improving visual quality and accurate analysis. While existing methods offer noise reduction, they often suffer from limitations like edge and texture loss, poor smoothness, and the need for manual parameter tuning.

View Article and Find Full Text PDF

Droplet coalescence in microchannels is a complex phenomenon influenced by various parameters such as droplet size, velocity, liquid surface tension, and droplet-droplet spacing. In this study, we thoroughly investigate the impact of these control parameters on droplet coalescence dynamics within a sudden expansion microchannel using two distinct numerical methods. Initially, we employ the boundary element method to solve the Brinkman integral equation, providing detailed insights into the underlying physics of droplet coalescence.

View Article and Find Full Text PDF

Development and applications of a machine learning model for an in-depth analysis of pentylenetetrazol-induced seizure-like behaviors in adult zebrafish.

Neuroscience

January 2025

Laboratory of Experimental Neuropsychobiology, Department of Biochemistry and Molecular Biology, Federal University of Santa Maria, Santa Maria, RS, Brazil; Graduate Program in Biological Sciences: Toxicological Biochemistry, Federal University of Santa Maria, Santa Maria, RS, Brazil; The International Zebrafish Neuroscience Research Consortium (ZNRC), Slidell, LA, United States. Electronic address:

Epilepsy, a neurological disorder causing recurring seizures, is often studied in zebrafish by exposing animals to pentylenetetrazol (PTZ), which induces clonic- and tonic-like behaviors. While adult zebrafish seizure-like behaviors are well characterized, manual assessment remains challenging due to its time-consuming nature, potential for human error/bias, and the risk of overlooking subtle behaviors. Aiming to circumvent these issues, we developed a machine learning model for automating the analysis of subtle abnormal and seizure-like behaviors in PTZ-exposed adult zebrafish.

View Article and Find Full Text PDF

Blood-based epigenome-wide association study and prediction of alcohol consumption.

Clin Epigenetics

January 2025

Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK.

Alcohol consumption is an important risk factor for multiple diseases. It is typically assessed via self-report, which is open to measurement error through recall bias. Instead, molecular data such as blood-based DNA methylation (DNAm) could be used to derive a more objective measure of alcohol consumption by incorporating information from cytosine-phosphate-guanine (CpG) sites known to be linked to the trait.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!