Structurally similar analogues of given query compounds can be rapidly retrieved from chemical databases by the molecular similarity search approaches. However, the computational cost associated with the exhaustive similarity search of a large compound database will be quite high. Although the latest indexing algorithms can greatly speed up the search process, they cannot be readily applicable to molecular similarity search problems due to the lack of Tanimoto similarity metric implementation. In this paper, we first implement Python or C++ codes to enable the Tanimoto similarity search via several recent indexing algorithms, such as Hnsw and Onng. Moreover, there are increasing interests in computational communities to develop robust benchmarking systems to access the performance of various computational algorithms. Here, we provide a benchmark to evaluate the molecular similarity searching performance of these recent indexing algorithms. To avoid the potential package dependency issues, two separate benchmarks are built based on currently popular container technologies, Docker and Singularity. The Singularity container is a rather new container framework specifically designed for the high-performance computing (HPC) platform and does not need the privileged permissions or the separated daemon process. Both benchmarking methods are extensible to incorporate other new indexing algorithms, benchmarking data sets, and different customized parameter settings. Our results demonstrate that the graph-based methods, such as Hnsw and Onng, consistently achieve the best trade-off between searching effectiveness and searching efficiencies. The source code of the entire benchmark systems can be downloaded from https://github.uconn.edu/mldrugdiscovery/MssBenchmark.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1021/acs.jcim.0c00393 | DOI Listing |
Behav Res Methods
December 2024
ETSI de Telecomunicación, Universidad Politécnica de Madrid, Avenida Complutense, 30, 28040, Madrid, Spain.
This study investigates the potential of large language models (LLMs) to estimate the familiarity of words and multi-word expressions (MWEs). We validated LLM estimates for isolated words using existing human familiarity ratings and found strong correlations. LLM familiarity estimates performed even better in predicting lexical decision and naming performance in megastudies than the best available word frequency measures.
View Article and Find Full Text PDFSci Rep
December 2024
Department of Pharmacy Services, Vocational School of Health Services, Osmaniye Korkut Ata University, Osmaniye, Turkey.
In this work, artificial neural network coupled with multi-objective genetic algorithm (ANN-NSGA-II) has been used to develop a model and optimize the conditions for the extracting of the Mentha longifolia (L.) L. plant.
View Article and Find Full Text PDFSci Rep
December 2024
School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, China.
In recent years, immune checkpoint inhibitors (ICIs) has emerged as a fundamental component of the standard treatment regimen for patients with head and neck squamous cell carcinoma (HNSCC). However, accurately predicting the treatment effectiveness of ICIs for patients at the same TNM stage remains a challenge. In this study, we first combined multi-omics data (mRNA, lncRNA, miRNA, DNA methylation, and somatic mutations) and 10 clustering algorithms, successfully identifying two distinct cancer subtypes (CSs) (CS1 and CS2).
View Article and Find Full Text PDFSci Rep
December 2024
Department of Pharmacoepidemiology, Graduate School of Medicine and PublicHealth, Kyoto University, Kyoto, Japan.
Although conservative treatment is commonly used for osteoporotic vertebral fracture (OVF), some patients experience functional disability following OVF. This study aimed to develop prediction models for new-onset functional impairment following admission for OVF using machine learning approaches and compare their performance. Our study consisted of patients aged 65 years or older admitted for OVF using a large hospital-based database between April 2014 and December 2021.
View Article and Find Full Text PDFSci Rep
December 2024
State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, National Medical Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.
The gut microbiome, recognized as a critical component in the development of chronic diseases and aging processes, constitutes a promising approach for predicting host health status. Previous research has underscored the potential of microbiome-based predictions, and the rapid advancements of machine learning techniques have introduced new opportunities for exploiting microbiome data. To predict various host nonhealthy conditions, this study proposed an integrated machine learning-based estimation pipeline of Gut Age Index (GAI) by establishing a health aging baseline with the gut microbiome data from healthy individuals.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!