Combinatorial chemistry and high-throughput screening technologies produce huge amounts of data on a regular basis. Sieving through these libraries of compounds and their associated assay data to identify appropriate series for follow-up is a daunting task, which has created a need for computational techniques that can find coherent islands of structure-activity relationships in this sea. Structural unit analysis (SUA) examines an entire data set so as to identify the molecular substructures or fragments that distinguish compounds with high activity from those with average activity. The algorithm is iterative and follows set heuristics in order to generate the structural units. It produces graphs that represent a set of units, which become SUA rules. Finding all of the input structures that match these graphs generates clusters. The Apriori algorithm for association rule mining is adapted to explore all of the combinations of structural units that define useful series. User-defined constraints are applied toward series selection and the refinement of rules. The significance of a series is determined by applying statistical methods appropriate to each data set. Application to the NCI-H23 (DTP Human Tumor Cell Line Screen) database serves to illustrate the process by which structural series are identified. An application of the method to scaffold hopping is then discussed in connection with proprietary screening data from a lead optimization project directed toward the treatment of respiratory tract infections at Bayer Healthcare. SUA was able to successfully identify promising alternative core structures in addition to identifying compounds with above-average activity and selectivity.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1021/ci050432z | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!