Ensemble methods for classification in cheminformatics.

J Chem Inf Comput Sci

Computational Biology & Applied Algorithmics Group, Max-Planck-Institut für Informatik, Stuhlsatzenhauseg 85, 66123 Saarbrücken, Germany, and Roche Pharma Research, Basel, Switzerland.

Published: November 2005

We describe the application of ensemble methods to binary classification problems on two pharmaceutical compound data sets. Several variants of single and ensembles models of k-nearest neighbors classifiers, support vector machines (SVMs), and single ridge regression models are compared. All methods exhibit robust classification even when more features are given than observations. On two data sets dealing with specific properties of drug-like substances (cytochrome P450 inhibition and "Frequent Hitters", i.e., unspecific protein inhibition), we achieve classification rates above 90%. We are able to reduce the cross-validated misclassification rate for the Frequent Hitters problem by a factor of 2 compared to previous results obtained for the same data set with different modeling techniques.

Download full-text PDF

Source
http://dx.doi.org/10.1021/ci049850eDOI Listing

Publication Analysis

Top Keywords

ensemble methods
8
data sets
8
classification
4
methods classification
4
classification cheminformatics
4
cheminformatics describe
4
describe application
4
application ensemble
4
methods binary
4
binary classification
4

Similar Publications

Background: Cohort studies contain rich clinical data across large and diverse patient populations and are a common source of observational data for clinical research. Because large scale cohort studies are both time and resource intensive, one alternative is to harmonize data from existing cohorts through multicohort studies. However, given differences in variable encoding, accurate variable harmonization is difficult.

View Article and Find Full Text PDF

Introduction: Anzi Tiaochong Fang (ATF) is a traditional Chinese medicine (TCM) Fangji widely used to treat antiphospholipid syndrome-related recurrent pregnancy loss (APS-RPL). This study aimed to identify the quality markers and elucidate the mechanisms of ATF in treating APS-RPL.

Methods: Chemical, network pharmacology, and in vitro verification were employed to identify quality markers and mechanisms of ATF.

View Article and Find Full Text PDF

Background: Identifying high risk factors and predicting lung cancer incidence risk are essential to prevention and intervention of lung cancer for the elderly. We aim to develop lung cancer incidence risk prediction model in the elderly to facilitate early intervention and prevention of lung cancer.

Methods: We stratified the population into six subgroups according to age and gender.

View Article and Find Full Text PDF

Introduction: Activation of the inflammatory response system is involved in the pathogenesis of generalized anxiety disorder (GAD). The purpose of this study was to identify and characterize inflammatory biomarkers in the diagnosis of GAD based on machine learning algorithms.

Methods: The evaluation of peripheral immune parameters and lymphocyte subsets was performed on patients with GAD.

View Article and Find Full Text PDF

AFFIPred: AlphaFold2 structure-based Functional Impact Prediction of missense variations.

Protein Sci

February 2025

Department of Biostatistics and Bioinformatics, Institute of Health Sciences, Acibadem University, Atasehir, Istanbul, Turkey.

Protein structure holds immense potential for pathogenicity prediction, albeit structure-based predictors are limited compared to the sequence-based counterparts due to the "structure knowledge gap" between large number of available protein sequences and relatively limited number of structures. Leveraging the highly accurate protein structures predicted by AlphaFold2 (AF2), we introduce AFFIPred, an ensemble machine learning classifier that combines sequence and AF2-based structural characteristics to predict missense variant pathogenicity. Based on the assessments on unseen datasets, AFFIPred reached a comparable level of performance with the state-of-the-art predictors such as AlphaMissense.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!