J Chem Inf Model
June 2015
Support vector machines (SVMs) are among the preferred machine learning algorithms for virtual compound screening and activity prediction because of their frequently observed high performance levels. However, a well-known conundrum of SVMs (and other supervised learning methods) is the black box character of their predictions, which makes it difficult to understand why models succeed or fail. Herein we introduce an approach to rationalize the performance of SVM models based upon the Tanimoto kernel compared with the linear kernel.
View Article and Find Full Text PDFSupport vector machines are a popular machine learning method for many classification tasks in biology and chemistry. In addition, the support vector regression (SVR) variant is widely used for numerical property predictions. In chemoinformatics and pharmaceutical research, SVR has become the probably most popular approach for modeling of non-linear structure-activity relationships (SARs) and predicting compound potency values.
View Article and Find Full Text PDFA new methodology for activity prediction of compounds from SAR matrices is introduced that is based upon conditional probabilities of activity. The approach has low computational complexity, is primarily designed for hit expansion from biological screening data, and accurately predicts both active and inactive compounds. Its performance is comparable to state-of-the-art machine learning methods such as support vector machines or Bayesian classification.
View Article and Find Full Text PDFSupport vector machines (SVMs) are among the most popular machine learning methods for compound classification and other chemoinformatics tasks such as, for example, the prediction of ligand-target pairs or compound activity profiles. Depending on the specific applications, different SVM strategies can be used. For example, in the context of potency-directed virtual screening, linear combinations of multiple SVM models have been shown to enrich database selection sets with potent compounds compared to individual models.
View Article and Find Full Text PDFSupervised machine learning models are widely used in chemoinformatics, especially for the prediction of new active compounds or targets of known actives. Bayesian classification methods are among the most popular machine learning approaches for the prediction of activity from chemical structure. Much work has focused on predicting structure-activity relationships (SARs) on the basis of experimental training data.
View Article and Find Full Text PDFProfiling of compound libraries against arrays of targets has become an important approach in pharmaceutical research. The prediction of multi-target compound activities also represents an attractive task for machine learning with potential for drug discovery applications. Herein, we have explored activity prediction in high-dimensional target space.
View Article and Find Full Text PDFActive compounds can participate in different local structure-activity relationship (SAR) environments and introduce different degrees of local SAR discontinuity, depending on their structural and potency relationships in data sets. Such SAR features have thus far mostly been analyzed using descriptive approaches, in particular, on the basis of activity landscape modeling. However, compounds in different local SAR environments have not yet been predicted.
View Article and Find Full Text PDFChem Biol Drug Des
July 2014
Profiling of compounds against target families has become an important approach in pharmaceutical research for the identification of hits and analysis of selectivity and promiscuity patterns. We report on modeling of profiling experiments involving 429 potential inhibitors and a panel of 24 different kinases using support vector machine (SVM) techniques and naïve Bayesian classification. The experimental matrix contained many different activity profiles.
View Article and Find Full Text PDFSupervised machine learning approaches, including support vector machines, random forests, Bayesian classifiers, nearest-neighbor similarity searching, and a conceptually distinct mapping algorithm termed DynaMAD, have been investigated for their ability to detect structurally related ligands of a given receptor with different mechanisms of action. For this purpose, a large number of simulated virtual screening trials were carried out with models trained on mechanistic subsets of different classes of receptor ligands. The results revealed that ligands with the desired mechanism of action were frequently contained in database selection sets of limited size.
View Article and Find Full Text PDFThe emerging chemical patterns (ECP) approach has been introduced for compound classification. Thus far, only very few ECP applications have been reported. Here, we further investigate the ECP methodology by studying complex classification problems.
View Article and Find Full Text PDF