Hashing algorithms and data structures for rapid searches of fingerprint vectors.

J Chem Inf Model

School of Information and Computer Sciences, Institute for Genomics and Bioinformatics, University of California, Irvine, Irvine, California 92697-3435, USA.

Published: August 2010

In many large chemoinformatics database systems, molecules are represented by long binary fingerprint vectors whose components record the presence or absence of particular functional groups or combinatorial features. To speed up database searches, we propose to add to each fingerprint a short signature integer vector of length M. For a given fingerprint, the i component of the signature vector counts the number of 1-bits in the fingerprint that fall on components congruent to i modulo M. Given two signatures, we show how one can rapidly compute a bound on the Jaccard-Tanimoto similarity measure of the two corresponding fingerprints, using the intersection bound. Thus, these signatures allow one to significantly prune the search space by discarding molecules associated with unfavorable bounds. Analytical methods are developed to predict the resulting amount of pruning as a function of M. Data structures combining different values of M are also developed together with methods for predicting the optimal values of M for a given implementation. Simulations using a particular implementation show that the proposed approach leads to a 1 order of magnitude speedup over a linear search and a 3-fold speedup over a previous implementation. All theoretical results and predictions are corroborated by large-scale simulations using molecules from the ChemDB. Several possible algorithmic extensions are discussed.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2926297PMC
http://dx.doi.org/10.1021/ci100132gDOI Listing

Publication Analysis

Top Keywords

data structures
8
fingerprint vectors
8
fingerprint
5
hashing algorithms
4
algorithms data
4
structures rapid
4
rapid searches
4
searches fingerprint
4
vectors large
4
large chemoinformatics
4

Similar Publications

Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models.

Genet Med

December 2024

Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN; Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN; Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN. Electronic address:

Purpose: The value of genetic information for improving the performance of clinical risk prediction models has yielded variable conclusions. Many methodological decisions have the potential to contribute to differential results. We performed multiple modeling experiments integrating clinical and demographic data from electronic health records (EHR) with genetic data to understand which decisions may affect performance.

View Article and Find Full Text PDF

Following the (revised) latent state-trait theory, the present study investigates the within-subject reliability, occasion specificity, common consistency, and construct validity of cognitive control measures in an intensive longitudinal design. These indices were calculated applying dynamic structural equation modeling while accounting for autoregressive effects and trait change. In two studies, participants completed two cognitive control tasks (Stroop and go/no-go) and answered questions about goal pursuit, self-control, executive functions, and situational aspects, multiple times per day.

View Article and Find Full Text PDF

Immunoglobulin A nephropathy (IgAN) is the most common primary glomerulonephritis worldwide with heterogeneous histopathological phenotypes. Although IgAN with membranoproliferative glomerulonephritis (MPGN)-like features has been reported in children and adults, treatment strategies for this rare IgAN subtype have not been established. Here, we present the case of a 56-year-old man with no history of kidney disease who initially presented with nephrotic syndrome.

View Article and Find Full Text PDF

Background: The surgical management of complicated diverticulitis varies across Europe. EAES members prioritized this topic to be addressed by a clinical practice guideline through an online questionnaire.

Objective: To develop evidence-informed clinical practice recommendations for key stakeholders involved in the treatment of complicated diverticulitis; to improve operative and perioperative outcomes, patient experience and quality of life through a systematic evidence-to-decision approach by a diverse, multidisciplinary panel.

View Article and Find Full Text PDF

This study assessed the factors militating against the effective implementation of electronic health records (EHR) in Nigeria, the computerization of patients' health records with a lot of benefits including improved patients' satisfaction, improved care processes, reduction of patients' waiting time, and medication errors. Despite these benefits, healthcare organizations are slow to adopt the EHR system. Therefore, the study assessed the factors militating against the effective implementation of the EHR system, the level of awareness of EHR, and the utilization of electronic health records; it also investigated the factors militating against the effective implementation of EHR.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!