Estimation of the applicability domain of kernel-based machine learning models for virtual screening.

J Cheminform

Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Sand 1, 72076 Tübingen, Germany.

Published: March 2010

Background: The virtual screening of large compound databases is an important application of structural-activity relationship models. Due to the high structural diversity of these data sets, it is impossible for machine learning based QSAR models, which rely on a specific training set, to give reliable results for all compounds. Thus, it is important to consider the subset of the chemical space in which the model is applicable. The approaches to this problem that have been published so far mostly use vectorial descriptor representations to define this domain of applicability of the model. Unfortunately, these cannot be extended easily to structured kernel-based machine learning models. For this reason, we propose three approaches to estimate the domain of applicability of a kernel-based QSAR model.

Results: We evaluated three kernel-based applicability domain estimations using three different structured kernels on three virtual screening tasks. Each experiment consisted of the training of a kernel-based QSAR model using support vector regression and the ranking of a disjoint screening data set according to the predicted activity. For each prediction, the applicability of the model for the respective compound is quantitatively described using a score obtained by an applicability domain formulation. The suitability of the applicability domain estimation is evaluated by comparing the model performance on the subsets of the screening data sets obtained by different thresholds for the applicability scores. This comparison indicates that it is possible to separate the part of the chemspace, in which the model gives reliable predictions, from the part consisting of structures too dissimilar to the training set to apply the model successfully. A closer inspection reveals that the virtual screening performance of the model is considerably improved if half of the molecules, those with the lowest applicability scores, are omitted from the screening.

Conclusion: The proposed applicability domain formulations for kernel-based QSAR models can successfully identify compounds for which no reliable predictions can be expected from the model. The resulting reduction of the search space and the elimination of some of the active compounds should not be considered as a drawback, because the results indicate that, in most cases, these omitted ligands would not be found by the model anyway.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2851576PMC
http://dx.doi.org/10.1186/1758-2946-2-2DOI Listing

Publication Analysis

Top Keywords

applicability domain
20
virtual screening
16
machine learning
12
kernel-based qsar
12
model
10
applicability
9
kernel-based machine
8
learning models
8
data sets
8
qsar models
8

Similar Publications

Extending the MST Model to Large Biomolecular Systems: Parametrization of the ddCOSMO-MST Continuum Solvation Model.

J Comput Chem

January 2025

Departament de Farmàcia i Tecnologia Farmacèutica, i Fisicoquímica, Facultat de Farmàcia i Ciències de l'Alimentació, Universitat de Barcelona (UB), Barcelona, Spain.

Continuum solvation models such as the polarizable continuum model and the conductor-like screening model are widely used in quantum chemistry, but their application to large biosystems is hampered by their computational cost. Here, we report the parametrization of the Miertus-Scrocco-Tomasi (MST) model for the prediction of hydration free energies of neutral and ionic molecules based on the domain decomposition formulation of COSMO (ddCOSMO), which allows a drastic reduction of the computational cost by several orders of magnitude. We also introduce several novelties in MST, like a new definition of atom types based on hybridization and an automatic setup of the cavity for charged regions.

View Article and Find Full Text PDF

COX-2 Inhibitor Prediction With KNIME: A Codeless Automated Machine Learning-Based Virtual Screening Workflow.

J Comput Chem

January 2025

Pharmaceutical Chemistry Research Laboratory 1, Department of Pharmaceutical Engineering & Technology, Indian Institute of Technology (Banaras Hindu University), Varanasi, India.

Cyclooxygenase-2 (COX-2) is an enzyme that plays a crucial role in inflammation by converting arachidonic acid into prostaglandins. The overexpression of enzyme is associated with conditions such as cancer, arthritis, and Alzheimer's disease (AD), where it contributes to neuroinflammation. In silico virtual screening is pivotal in early-stage drug discovery; however, the absence of coding or machine learning expertise can impede the development of reliable computational models capable of accurately predicting inhibitor compounds based on their chemical structure.

View Article and Find Full Text PDF

Awareness and Knowledge About Preconception Healthcare: A Cross-Sectional Study of Early Years UAE Medical Students.

J Clin Med

December 2024

Department of Obstetrics & Gynecology, College of Medicine & Health Sciences (CMHS), United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates.

Preconception health is critical for improving maternal and child health. The main objective of the study was to explore medical students' health habits, quality of life, and knowledge of preconception healthcare. We conducted a cross-sectional study between 15 March 2023 and 31 May 2024 among medical students at United Arab Emirates University.

View Article and Find Full Text PDF

Telerehabilitation and Its Impact Following Stroke: An Umbrella Review of Systematic Reviews.

J Clin Med

December 2024

Department of Public Health and Sport Sciences, Faculty of Health and Life Sciences, Medical School, University of Exeter, Exeter EX1 2LU, UK.

: To summarize the impact of various telerehabilitation interventions on motor function, balance, gait, activities of daily living (ADLs), and quality of life (QoL) among patients with stroke and to determine the existing telerehabilitation interventions for delivering physiotherapy sessions in clinical practice. : Six electronic databases were searched to identify relevant quantitative systematic reviews (SRs). Due to substantial heterogeneity, the data were analysed narratively.

View Article and Find Full Text PDF

Breast cancer (BC) is one of the most lethal cancers worldwide, and its early diagnosis is critical for improving patient survival rates. However, the extraction of key information from complex medical images and the attainment of high-precision classification present a significant challenge. In the field of signal processing, texture-rich images typically exhibit periodic patterns and structures, which are manifested as significant energy concentrations at specific frequencies in the frequency domain.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!