The average and variance of the molecular similarities in a set is high-value and useful information for cheminformatics tasks like chemical space exploration and subset selection. However, the calculation of the variance of the complete similarity matrix has a quadratic complexity, ( ). As the sizes of molecular libraries constantly increase, this pairwise approach is unfeasible.
View Article and Find Full Text PDFExtended similarity indices (i.e., generalization of pairwise similarity) have recently gained importance because of their simplicity, fast computation and superiority in tasks like diversity picking.
View Article and Find Full Text PDFThe presence of Activity Cliffs (ACs) has been known to represent a challenge for QSAR modeling. With its high data dependency, Machine Learning QSAR models will be directly influenced by the activity landscape. We propose several extended similarity and extended SALI methods to study the implications of ACs distribution on the training and test sets on the model's errors.
View Article and Find Full Text PDFThe widespread use of Machine Learning (ML) techniques in chemical applications has come with the pressing need to analyze extremely large molecular libraries. In particular, clustering remains one of the most common tools to dissect the chemical space. Unfortunately, most current approaches present unfavorable time and memory scaling, which makes them unsuitable to handle million- and billion-sized sets.
View Article and Find Full Text PDFThe quantification of molecular similarity has been present since the beginning of cheminformatics. Although several similarity indices and molecular representations have been reported, all of them ultimately reduce to the calculation of molecular similarities of only two objects at a time. Hence, to obtain the average similarity of a set of molecules, all the pairwise comparisons need to be computed, which demands a quadratic scaling in the number of computational resources.
View Article and Find Full Text PDFVisualization of the chemical space is useful in many aspects of chemistry, including compound library design, diversity analysis, and exploring structure-property relationships, to name a few. Examples of notable research areas where the visualization of chemical space has strong applications are drug discovery and natural product research. However, the sheer volume of even comparatively small sub-sections of chemical space implies that we need to use approximations at the time of navigating through chemical space.
View Article and Find Full Text PDF