RISC: Rapid Inverted-Index Based Search of Chemical Fingerprints.

J Chem Inf Model

Department of Computer Science , IIT-Delhi , New Delhi , 110016 , India.

Published: June 2019

The ability to search for a query molecule on massive molecular repositories is a fundamental task in chemoinformatics and drug-discovery. Chemical fingerprints are commonly used to characterize the structure and properties of molecules. Some fingerprints, particularly unfolded fingerprints, are often of extreme high dimension and sparse where only few features have a positive value. In this work, we propose a new searching algorithm, RISC, which exploits sparsity in high-dimensional fingerprints to derive effective pruning mechanisms and dramatically speed-up searching efficiency. RISC is robust enough to work on both binary and nonbinary chemical fingerprints. Extensive experiments on Range Queries and Top-k Queries across several molecular repositories demonstrate that at fingerprints of dimension 2048 and above, which is often the case with unfolded fingerprints, RISC is consistently faster than the state-of-the-art techniques. The source code of our implementation is available at http://www.cse.iitd.ac.in/~sayan/software.html .

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jcim.9b00069DOI Listing

Publication Analysis

Top Keywords

chemical fingerprints
12
fingerprints
8
molecular repositories
8
unfolded fingerprints
8
risc
4
risc rapid
4
rapid inverted-index
4
inverted-index based
4
based search
4
search chemical
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!