This paper evaluates the effectiveness of various similarity coefficients for 2D similarity searching when multiple bioactive target structures are available. Similarity searches using several different activity classes within the MDL Drug Data Report and the Dictionary of Natural Products databases are performed using BCI 2D fingerprints. Using data fusion techniques to combine the resulting nearest neighbor lists we obtain group recall results which, in many cases, are a considerable improvement on standard average recall values obtained for individual structures. It is shown that the degree of improvement can be related to the structural diversity of the activity class that is searched for, the best results being found for the most diverse groups. The group recall of active compounds using subsets of the class is also investigated: for highly self-similar activity classes, the group recall improvement saturates well before the full activity class size is reached. A rough correlation is found between the relative improvement using the group recall and the square of the number of unique compounds available in all of the merged lists. The Tanimoto coefficient is found unambiguously to be the best coefficient to use for the recovery of active compounds using multiple targets. Furthermore, when using the Tanimoto coefficient, the "MAX" fusion rule is found to be more effective than the "SUM" rule for the combination of similarity searches from multiple targets. The use of group recall can lead to improved enrichment in database searches and virtual screening.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1021/ci049867x | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!