Utilizing Low-Dimensional Molecular Embeddings for Rapid Chemical Similarity Search.

Kathryn E Kirchoff James Wellnitz Joshua E Hochuli Travis Maxfield Konstantin I Popov Shawn Gomez Alexander Tropsha

Adv Inf Retr

Eshelman School of Pharmacy, UNC Chapel Hill.

Published: March 2024

Nearest neighbor-based similarity searching is a common task in chemistry, with notable use cases in drug discovery. Yet, some of the most commonly used approaches for this task still leverage a brute-force approach. In practice this can be computationally costly and overly time-consuming, due in part to the sheer size of modern chemical databases. Previous computational advancements for this task have generally relied on improvements to hardware or dataset-specific tricks that lack generalizability. Approaches that leverage lower-complexity searching algorithms remain relatively underexplored. However, many of these algorithms are approximate solutions and/or struggle with typical high-dimensional chemical embeddings. Here we evaluate whether a combination of low-dimensional chemical embeddings and a -d tree data structure can achieve fast nearest neighbor queries while maintaining performance on standard chemical similarity search benchmarks. We examine different dimensionality reductions of standard chemical embeddings as well as a learned, structurally-aware embedding-SmallSA-for this task. With this framework, searches on over one billion chemicals execute in less than a second on a single CPU core, five orders of magnitude faster than the brute-force approach. We also demonstrate that SmallSA achieves competitive performance on chemical similarity benchmarks.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10998712	PMC
http://dx.doi.org/10.1007/978-3-031-56060-6_3	DOI Listing

Publication Analysis

Top Keywords

chemical similarity

chemical embeddings

similarity search

brute-force approach

standard chemical

chemical

utilizing low-dimensional

low-dimensional molecular

embeddings

molecular embeddings

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!