In the pharmaceutical industry it is common to generate many QSAR models from training sets containing a large number of molecules and a large number of descriptors. The best QSAR methods are those that can generate the most accurate predictions but that are not overly expensive computationally. In this paper we compare eXtreme Gradient Boosting (XGBoost) to random forest and single-task deep neural nets on 30 in-house data sets.
View Article and Find Full Text PDFSystems chemical biology, the integration of chemistry, biology and computation to generate understanding about the way small molecules affect biological systems as a whole, as well as related fields such as chemogenomics, are central to emerging new paradigms of drug discovery such as drug repurposing and personalized medicine. Recent Semantic Web technologies such as RDF and SPARQL are technical enablers of systems chemical biology, facilitating the deployment of advanced algorithms for searching and mining large integrated datasets. In this paper, we aim to demonstrate how these technologies together can change the way that drug discovery is accomplished.
View Article and Find Full Text PDFMotivation: Networks to predict protein pharmacology can be created using ligand similarity or using known bioassay response profiles of ligands. Recent publications indicate that similarity methods can be highly accurate, but it has been unclear how similarity methods compare to methods that use bioassay response data directly.
Results: We created protein networks based on ligand similarity (Similarity Ensemble Approach or SEA) and ligand bioassay response-data (BARD) using 155 Pfizer internal BioPrint assays.
Ligand-based computational models could be more readily shared between researchers and organizations if they were generated with open source molecular descriptors [e.g., chemistry development kit (CDK)] and modeling algorithms, because this would negate the requirement for proprietary commercial software.
View Article and Find Full Text PDFAs the cost of discovering and developing new pharmaceutically relevant compounds continues to rise, it is increasingly important to select the right molecules to prosecute very early in drug discovery. The development of high throughput in vitro assays of hepatic metabolic clearance has allowed for vast quantities of data generation; however, these large screens are still costly and remain dependant on animal usage. To further expand the value of these screens and ultimately aid in animal usage reduction, we have developed an in silico model of rat liver microsomal (RLM) clearance.
View Article and Find Full Text PDFComputational models of cytochrome P450 3A4 inhibition were developed based on high-throughput screening data for 4470 proprietary compounds. Multiple models differentiating inhibitors (IC(50) <3 microM) and noninhibitors were generated using various machine-learning algorithms (recursive partitioning [RP], Bayesian classifier, logistic regression, k-nearest-neighbor, and support vector machine [SVM]) with structural fingerprints and topological indices. Nineteen models were evaluated by internal 10-fold cross-validation and also by an independent test set.
View Article and Find Full Text PDF