Democratizing cheminformatics: interpretable chemical grouping using an automated KNIME workflow.

J Cheminform

National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.

Published: August 2024

With the increased availability of chemical data in public databases, innovative techniques and algorithms have emerged for the analysis, exploration, visualization, and extraction of information from these data. One such technique is chemical grouping, where chemicals with common characteristics are categorized into distinct groups based on physicochemical properties, use, biological activity, or a combination. However, existing tools for chemical grouping often require specialized programming skills or the use of commercial software packages. To address these challenges, we developed a user-friendly chemical grouping workflow implemented in KNIME, a free, open-source, low/no-code, data analytics platform. The workflow serves as an all-encompassing tool, expertly incorporating a range of processes such as molecular descriptor calculation, feature selection, dimensionality reduction, hyperparameter search, and supervised and unsupervised machine learning methods, enabling effective chemical grouping and visualization of results. Furthermore, we implemented tools for interpretation, identifying key molecular descriptors for the chemical groups, and using natural language summaries to clarify the rationale behind these groupings. The workflow was designed to run seamlessly in both the KNIME local desktop version and KNIME Server WebPortal as a web application. It incorporates interactive interfaces and guides to assist users in a step-by-step manner. We demonstrate the utility of this workflow through a case study using an eye irritation and corrosion dataset.Scientific contributionsThis work presents a novel, comprehensive chemical grouping workflow in KNIME, enhancing accessibility by integrating a user-friendly graphical interface that eliminates the need for extensive programming skills. This workflow uniquely combines several features such as automated molecular descriptor calculation, feature selection, dimensionality reduction, and machine learning algorithms (both supervised and unsupervised), with hyperparameter optimization to refine chemical grouping accuracy. Moreover, we have introduced an innovative interpretative step and natural language summaries to elucidate the underlying reasons for chemical groupings, significantly advancing the usability of the tool and interpretability of the results.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11330086PMC
http://dx.doi.org/10.1186/s13321-024-00894-1DOI Listing

Publication Analysis

Top Keywords

chemical grouping
28
chemical
10
programming skills
8
grouping workflow
8
molecular descriptor
8
descriptor calculation
8
calculation feature
8
feature selection
8
selection dimensionality
8
dimensionality reduction
8

Similar Publications

The tumor microenvironment (TME) is integral to cancer progression, impacting metastasis and treatment response. It consists of diverse cell types, extracellular matrix components, and signaling molecules that interact to promote tumor growth and therapeutic resistance. Elucidating the intricate interactions between cancer cells and the TME is crucial in understanding cancer progression and therapeutic challenges.

View Article and Find Full Text PDF

Improving care experiences for premenstrual symptoms and disorders in the United Kingdom (UK): a mixed-methods approach.

BMC Health Serv Res

January 2025

Cambridge Centre for Neuropsychiatric Research, Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK.

Background: Poor care experiences are reported for premenstrual disorders, which may result in negative outcomes such as distress, reduced healthcare engagement, and delays to diagnosis. This research aimed to explore healthcare experiences for premenstrual symptoms in the United Kingdom and identify areas for potential improvements based on participant responses.

Method: An online survey was delivered, with participants recruited via social media.

View Article and Find Full Text PDF

Background: Organic fertilizers are safer and more eco-friendly than chemical fertilizers; hence, organic fertilizers can be used to support sustainable farming. The effects of PGPRs are manifold in agriculture, especially in monoculture crops, where the soil needs to be modified to increase germination, yield, and disease resistance. The objective of this study was to assess the effects of PGPRs combined with fertilizer on the yield and productivity of canola.

View Article and Find Full Text PDF

Nontarget Analysis and Characterization of a Group of Abundant Polyfluoroalkyl Substances─Fluorinated Benzoylurea Pesticides and Their Analogues and Transformation Products in Fish by LC-HRMS and Chemical Species-Specific Algorithms.

J Agric Food Chem

January 2025

Guangdong Key Laboratory of Environmental Resources Utilization and Protection, State Key Laboratory of Organic Geochemistry, Guangzhou Institute of Geochemistry, Chinese Academy of Sciences, Guangzhou 510640, China.

Poly- and perfluoroalkyl substances (PFASs) are a large class of fluorinated chemicals used in various industrial and agrochemical products such as fluorinated benzoylurea (FBU) pesticides. Initiated from an incidental and preliminary finding of three high-abundance FBUs in fish, this study implemented nontarget analysis and characterization for FBUs together with their analogues and transformation products (TPs) in fish using liquid chromatography, high-resolution mass spectrometry, and chemical species-specific algorithms. A total of 23 FBU-relevant compounds were found and tentatively/accurately elucidated with structures, including 18 PFASs and 5 non-PFAS compounds, of which 4 were original FBUs, 8 were FBU analogues, and 11 were FBU-TPs.

View Article and Find Full Text PDF

Biosynthesis of lactacystin as a proteasome inhibitor.

Commun Chem

January 2025

Graduate School of Engineering, Hokkaido University, N13-W8, Kita-ku, Sapporo, Hokkaido, 060-8628, Japan.

Lactacystin is an irreversible proteasome inhibitor isolated from Streptomyces lactacystinicus. Despite its importance for its biological activity, the biosynthesis of lactacystin remains unknown. In this study, we identified the lactacystin biosynthetic gene cluster by gene disruption and heterologous expression experiments.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!