AI Article Synopsis

  • The rapid growth of publicly available chemical data offers a chance to create effective QSAR models, but concerns about data quality and inconsistencies arise, especially when merging data from various sources.
  • An automated workflow has been developed to standardize chemical structures and produce "QSAR-ready" forms before deriving molecular descriptors, using a systematic process to ensure accurate results.
  • This workflow supports collaborative QSAR modeling initiatives and has been adapted for other uses, such as generating "MS-ready structures" for applications in mass spectrometry, and is accessible for free through the KNIME platform.

Article Abstract

The rapid increase of publicly available chemical structures and associated experimental data presents a valuable opportunity to build robust QSAR models for applications in different fields. However, the common concern is the quality of both the chemical structure information and associated experimental data. This is especially true when those data are collected from multiple sources as chemical substance mappings can contain many duplicate structures and molecular inconsistencies. Such issues can impact the resulting molecular descriptors and their mappings to experimental data and, subsequently, the quality of the derived models in terms of accuracy, repeatability, and reliability. Herein we describe the development of an automated workflow to standardize chemical structures according to a set of standard rules and generate two and/or three-dimensional "QSAR-ready" forms prior to the calculation of molecular descriptors. The workflow was designed in the KNIME workflow environment and consists of three high-level steps. First, a structure encoding is read, and then the resulting in-memory representation is cross-referenced with any existing identifiers for consistency. Finally, the structure is standardized using a series of operations including desalting, stripping of stereochemistry (for two-dimensional structures), standardization of tautomers and nitro groups, valence correction, neutralization when possible, and then removal of duplicates. This workflow was initially developed to support collaborative modeling QSAR projects to ensure consistency of the results from the different participants. It was then updated and generalized for other modeling applications. This included modification of the "QSAR-ready" workflow to generate "MS-ready structures" to support the generation of substance mappings and searches for software applications related to non-targeted analysis mass spectrometry. Both QSAR and MS-ready workflows are freely available in KNIME, via standalone versions on GitHub, and as docker container resources for the scientific community. Scientific contribution: This work pioneers an automated workflow in KNIME, systematically standardizing chemical structures to ensure their readiness for QSAR modeling and broader scientific applications. By addressing data quality concerns through desalting, stereochemistry stripping, and normalization, it optimizes molecular descriptors' accuracy and reliability. The freely available resources in KNIME, GitHub, and docker containers democratize access, benefiting collaborative research and advancing diverse modeling endeavors in chemistry and mass spectrometry.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10880251PMC
http://dx.doi.org/10.1186/s13321-024-00814-3DOI Listing

Publication Analysis

Top Keywords

chemical structures
16
experimental data
12
qsar modeling
8
associated experimental
8
substance mappings
8
molecular descriptors
8
automated workflow
8
mass spectrometry
8
github docker
8
workflow
7

Similar Publications

Heteroleptic An (An = U, Np) chlorido-ketoenaminate complexes of the type [AnCl(TFB-BuA)(THF)] ( type: , ; TFB-BuA = 4-(-butylamino)-1,1,1-trifluorobut-3-en-2-one) and the homoleptic Np heteroarylalkenolate complexes [Np(PyTFP)] (, PyTFP = 1-(pyridin-2-yl)-3,3,3-trifluoroprop-1-en-2-ol) and [Np(DMOTFP)] (, DMOTFP = 1-(4,5-dimethyloxazol-2-yl)-3,3,3-trifluoroprop-1-en-2-ol) were synthesized and characterized (SC-XRD, NMR, Vis-NIR, MS). While their solid-state structures compare well to those of their uranium analogues, the behavior in solution showed significant differences. The binding motif of the DMOTFP ligand in complex can change to form two different complex isomers, as seen by paramagnetic chemical shifts in NMR experiments.

View Article and Find Full Text PDF

Gold(I)-Catalyzed 2-Deoxy-β-glycosylation via 1,2-Alkyl/Arylthio Migration: Synthesis of Velutinoside A Pentasaccharide.

J Am Chem Soc

January 2025

Molecular Synthesis Center, Key Laboratory of Marine Drugs of Ministry of Education, Shandong Key Laboratory of Glycoscience and Glycotherapeutics, School of Medicine and Pharmacy, Ocean University of China, Qingdao 266003, China.

2-Deoxy-β-glycosides are essential components of natural products and pharmaceuticals; however, the corresponding 2-deoxy-β-glycosidic bonds are challenging to chemically construct. Herein, we describe an efficient catalytic protocol for synthesizing 2-deoxy-β-glycosides via either IPrAuNTf-catalyzed activation of a unique 1,2--positioned C2--propargyl xanthate (OSPX) leaving group or (PhO)PAuNTf-catalyzed activation of a 1,2--C2--alkynylbenzoate (OABz) substituent of the corresponding thioglycosides. These activation processes trigger 1,2-alkyl/arylthio-migration glycosylation, enabling the synthesis of structurally diverse 2-deoxy-β-glycosides under mild reaction conditions.

View Article and Find Full Text PDF

Lung adenocarcinoma (LUAD) is the most common histological subtype of nonsmall-cell lung cancer. Herein, a multiomics method, which combined proteomic and N-glycoproteomic analyses, was developed to analyze the normal and cancerous bronchoalveolar lavage fluids (BALFs) from six LUAD patients to identify potential biomarkers of LUAD. The data-independent acquisition proteomic analysis was first used to analyze BALFs, which identified 59 differentially expressed proteins (DEPs).

View Article and Find Full Text PDF

A novel ternary boride, NiPtB ( = 0.5), was obtained by argon-arc melting of the elements followed by annealing at 750 °C. It exhibits a new structure type with the space group ( = 2.

View Article and Find Full Text PDF

Mesoporous Silica with Dual Stimuli-Microenvironment Responsiveness via the Pectin-Gated Strategy for Controlled Release of Rosmarinic Acid.

ACS Appl Bio Mater

January 2025

College of Chemical and Biological Engineering, Zhejiang Provincial Key Laboratory of Advanced Chemical Engineering Manufacture Technology, Zhejiang University, Hangzhou 310027, China.

Traditional drug-delivery methods are limited by low bioavailability and nonspecific drug distribution, resulting in poor therapeutic efficacy and potential risks of toxicity. Mesoporous silica nanoparticles (MSNs) have attracted wide attention as drug-delivery carriers due to their large specific surface area, adjustable pore size, good mechanical strength, good biocompatibility, and rich hydroxyl groups on their surface. In this paper, MSNs were synthesized by a template method, and the morphology and pore structure were regulated.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!