Comparing the Influence of Simulated Experimental Errors on 12 Machine Learning Algorithms in Bioactivity Modeling Using 12 Diverse Data Sets.

J Chem Inf Model

†Département de Biologie Structurale et Chimie, Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3825, 25, rue du Dr Roux, 75015 Paris, Ile de France, France.

Published: July 2015

To date, no systematic study has assessed the effect of random experimental errors on the predictive power of QSAR models. To address this shortage, we have benchmarked the noise sensitivity of 12 learning algorithms on 12 data sets (15,840 models in total), namely the following: Support Vector Machines (SVM) with radial and polynomial (Poly) kernels, Gaussian Process (GP) with radial and polynomial kernels, Relevant Vector Machines (radial kernel), Random Forest (RF), Gradient Boosting Machines (GBM), Bagged Regression Trees, Partial Least Squares, and k-Nearest Neighbors. Model performance on the test set was used as a proxy to monitor the relative noise sensitivity of these algorithms as a function of the level of simulated noise added to the bioactivities from the training set. The noise was simulated by sampling from Gaussian distributions with increasingly larger variances, which ranged from zero to the range of pIC50 values comprised in a given data set. General trends were identified by designing a full-factorial experiment, which was analyzed with a normal linear model. Overall, GBM displayed low noise tolerance, although its performance was comparable to RF, SVM Radial, SVM Poly, GP Poly, and GP Radial at low noise levels. Of practical relevance, we show that the bag fraction parameter has a marked influence on the noise sensitivity of GBM, suggesting that low values (e.g., 0.1-0.2) for this parameter should be set when modeling noisy data. The remaining 11 algorithms display a comparable noise tolerance, as a smooth and linear degradation of model performance is observed with the level of noise. However, SVM Poly and GP Poly display significant noise sensitivity at high noise levels in some cases. Overall, these results provide a practical guide to make informed decisions about which algorithm and parameter values to use according to the noise level present in the data.

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jcim.5b00101DOI Listing

Publication Analysis

Top Keywords

noise sensitivity
16
noise
12
experimental errors
8
learning algorithms
8
data sets
8
vector machines
8
svm radial
8
radial polynomial
8
model performance
8
low noise
8

Similar Publications

Including sensor information in medical interventions aims to support surgeons to decide on subsequent action steps by characterizing tissue intraoperatively. With bladder cancer, an important issue is tumor recurrence because of failure to remove the entire tumor. Impedance measurements can help to classify bladder tissue and give the surgeons an indication on how much tissue to remove.

View Article and Find Full Text PDF

This study used Raman and near-infrared (NIR) spectroscopy to monitor small real-time changes in powder blends and tablets in low-dose pharmaceutical formulations. The research aims to enhance process analytical technology (PAT) in pharmaceutical manufacturing, ensuring high-quality and uniform products with applications to produce drugs with narrow therapeutic indices (NTI). The study utilizes Raman and NIR spatially resolved spectroscopy (SRS) techniques to monitor a moderate cohesive material's active pharmaceutical ingredient (API) concentrations during manufacturing.

View Article and Find Full Text PDF

Mesoporous carbon nanospheres-assisted amplified electrochemiluminescence for L-cysteine detection.

Anal Biochem

January 2025

Key Laboratory of Green and Precise Synthetic Chemistry and Applications, Ministry of Education, Anhui Provincial Key Laboratory of Synthetic Chemistry and Applications, College of Chemistry and Materials Science, Huaibei Normal University, Huaibei, Anhui 235000, PR China. Electronic address:

Luminol-loaded mesoporous carbon nanospheres (MCs@LU) were utilized to develop a highly sensitive electrochemiluminescence (ECL) sensor for the detection of L-cysteine (L-Cys). L-Cys acted as the coreactant of luminol, and the pore confinement effect of mesoporous carbons (MCs) resulted in a robust ECL signal. Upon optimization, a linear correlation between the ECL intensity and L-Cys concentration was observed over the range of 5.

View Article and Find Full Text PDF

PET has become an important clinical modality but is limited to imaging positron emitters. Recently, PET imaging withZr, which has a half-life of 3 days, has attracted much attention in immuno-PET to visualize immune cells and cancer cells by targeting specific antibodies on the cell surface. However,Zr emits a single gamma ray at 909 keV four times more frequently than positrons, causing image quality degradation in conventional PET.

View Article and Find Full Text PDF
Article Synopsis
  • Efficient readout of information is crucial for quantum simulation, yet standard measurements typically focus on just one observable at a time.
  • This research introduces an atomic beam splitter for controlled outcoupling, allowing simultaneous measurement of both number imbalance and relative phase in two coupled 1D Bose gases, acting as a simulator for sine-Gordon field theory.
  • The method demonstrates quantum limitations through number squeezing, tracks Josephson oscillation dynamics, and permits atom extraction while preserving coherent dynamics, paving the way for studying quantum properties and multitime correlation functions in larger systems.
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!