Structural Key Bit Occurrence Frequencies and Dependencies in PubChem and Their Effect on Similarity Searches.

Mol Inform

Sci-Tec, Inc. 107 Canterbury Rd, Oak Ridge, TN 37830-7712, USA.

Published: April 2013

Little published literature exists on the 881 bit structural keys used by PubChem for categorizing and comparing the compounds present in its database. We characterized these structural keys by examining their frequencies of occurrence within the PubChem compound database. In addition, bit dependencies, defined as the universal presence of a bit given the presence of another, were determined. We show that the vast majority of bits are rarely set and that substantial numbers of dependencies exist. A comparison of similarity searches with five United States Food and Drug Administration approved drugs as reference compounds using the full structural keys versus a variant in which all dependent bits were removed was performed using the Tanimoto coefficient. These bit dependencies not only affect similarity scores, but also alter the compounds returned in similarity searching. Judicious selection of bits is needed to maintain sufficient ability to differentiate related compounds.

Download full-text PDF

Source
http://dx.doi.org/10.1002/minf.201300006DOI Listing

Publication Analysis

Top Keywords

structural keys
12
similarity searches
8
bit dependencies
8
structural
4
structural key
4
bit
4
key bit
4
bit occurrence
4
occurrence frequencies
4
dependencies
4

Similar Publications

Comparative Analysis of Recurrent Neural Networks with Conjoint Fingerprints for Skin Corrosion Prediction.

J Chem Inf Model

January 2025

Division of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, Khon Kaen 40002, Thailand.

Skin corrosion assessment is an essential toxicity end point that addresses safety concerns for topical dosage forms and cosmetic products. Previously, skin corrosion assessments required animal testing; however, differences in skin architecture and ethical concerns regarding animal models have fostered the advancement of alternative methods such as and models. This study aimed to develop deep learning (DL) models based on recurrent neural networks (RNNs) for classifying skin corrosion of chemical compounds based on chemical language notation, molecular substructure, physicochemical properties, and a combination of these three properties called conjoint fingerprints.

View Article and Find Full Text PDF

Algebraic structures are highly effective in designing symmetric key cryptosystems; however, if the key space is not sufficiently large, such systems become vulnerable to brute-force attacks. To address this challenge, our research focuses on enlarging the key space in symmetric key schemes by integrating the non-chain ring with a four-dimensional chaotic system. While chaotic maps offer significant potential for data processing, relying solely on them does not fully leverage their operational advantages.

View Article and Find Full Text PDF

Surgical treatment for severe endodontic-periodontal lesion: A case report with 2-year follow-up.

Clin Adv Periodontics

January 2025

Department of Periodontology, Graduate School of Medical and Dental Sciences, Institute of Science Tokyo, Tokyo, Japan.

Background: Various surgical techniques have recently been developed for periodontal tissue regeneration, especially those do not involve any incisions in the interdental papillae at the regeneration site. These techniques have significant advantages for obtaining clinical attachment gain with least amount of gingival recession, however, may also have disadvantages such as limited field of surgical view, difficulty in debridement, and limited access only from the buccal side. This case report addresses a 2-year follow-up with a novel surgical approach to achieve periodontal regeneration that overcomes these limitations: the flexible tunnel technique (FTT).

View Article and Find Full Text PDF

Background: All chemical forms of energy and oxygen on Earth are generated via photosynthesis where light energy is converted into redox energy by two photosystems (PS I and PS II). There is an increasing number of PS I 3D structures deposited in the Protein Data Bank (PDB). The Triangular Spatial Relationship (TSR)-based algorithm converts 3D structures into integers (TSR keys).

View Article and Find Full Text PDF

Wing geometric morphometrics is effective to separate sand fly species (Diptera, Psychodidae, Phlebotominae) related with leishmaniasis transmission in Mexico.

Acta Trop

January 2025

Colección Nacional de Insectos, Departamento de Zoología, Instituto de Biología, Universidad Nacional Autónoma de México, Ciudad de México, Mexico. Electronic address:

Nearly 32 % of sand fly species recorded in Mexico are related to Leishmania transmission. A correct morphological identification of sand flies is essential to improve epidemiological and control strategies. Wing geometric morphometrics (GM) has proven to be a complementary tool for classical taxonomy, allowing us to explore variations in structure and shape between species.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!