Drug discovery is a multiparameter optimization process in which the goal of a project is to identify compounds that meet multiple property criteria required to achieve a therapeutic objective. However, once a profile of property criteria has been chosen, the impact of these criteria on the decisions made regarding progression of compounds or chemical series should be carefully considered. In some cases the decision is very sensitive to a specific property criterion, and such a criterion may artificially distort the direction of the project; any uncertainty in the "correct" value or the importance of this criterion may lead to valuable opportunities being missed.
View Article and Find Full Text PDFAll of the experimental compound data with which we work have significant uncertainties, due to imperfect correlations between experimental systems and the ultimate in vivo properties of compounds and the inherent variability in experimental conditions. When using these data to make decisions, it is essential that these uncertainties are taken into account to avoid making inappropriate decisions in the selection of compounds, which can lead to wasted effort and missed opportunities. In this paper we will consider approaches to rigorously account for uncertainties when selecting between compounds or assessing compounds against a property criterion; first for an individual measurement of a single property and then for multiple measurements of a property for the same compound.
View Article and Find Full Text PDFA number of alternative variables have appeared in the medicinal chemistry literature trying to provide a more rigorous formulation of the guidelines proposed by Lipinski to exclude chemical entities with poor pharmacokinetic properties early in the discovery process. Typically, these variables combine the affinity towards the target with physicochemical properties of the ligand and are named efficiencies or ligand efficiencies. Several formulations have been defined and used by different laboratories with different degrees of success.
View Article and Find Full Text PDFIn this article, we present an automatic model generation process for building QSAR models using Gaussian Processes, a powerful machine learning modeling method. We describe the stages of the process that ensure models are built and validated within a rigorous framework: descriptor calculation, splitting data into training, validation and test sets, descriptor filtering, application of modeling techniques and selection of the best model. We apply this automatic process to data sets of blood-brain barrier penetration and aqueous solubility and compare the resulting automatically generated models with 'manually' built models using external test sets.
View Article and Find Full Text PDF