Language models trained on molecular string representations have shown strong performance in predictive and generative tasks. However, practical applications require not only making accurate predictions, but also explainability - the ability to explain the reasons and rationale behind the predictions. In this work, we explore explainability for a chemical language model by adapting a transformer-specific and a model-agnostic input attribution technique. We fine-tune a pretrained model to predict aqueous solubility, compare training and architecture variants, and evaluate visualizations of attributed relevance. The model-agnostic SHAP technique provides sensible attributions, highlighting the positive influence of individual electronegative atoms, but does not explain the model in terms of functional groups or explain how the model represents molecular strings internally to make predictions. In contrast, the adapted transformer-specific explainability technique produces sparse attributions, which cannot be directly attributed to functional groups relevant to solubility. Instead, the attributions are more characteristic of how the model maps molecular strings to its latent space, which seems to represent features relevant to molecular similarity rather than functional groups. These findings provide insight into the representations underpinning chemical language models, which we propose may be leveraged for the design of informative chemical spaces for training more accurate, advanced and explainable models.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11389796 | PMC |
http://dx.doi.org/10.1039/d4dd00084f | DOI Listing |
Future Oncol
January 2025
uDepartment of Medicine, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
J Cheminform
January 2025
Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, University of Bonn, Friedrich-Hirzebruch-Allee 5/6, 53115, Bonn, Germany.
Analogue series (AS) are generated during compound optimization in medicinal chemistry and are the major source of structure-activity relationship (SAR) information. Pairs of active AS consisting of compounds with corresponding substituents and comparable potency progression represent SAR transfer events for the same target or across different targets. We report a new computational approach to systematically search for SAR transfer series that combines an AS alignment algorithm with context-depending similarity assessment based on vector embeddings adapted from natural language processing.
View Article and Find Full Text PDFData Brief
February 2025
Agroscope, Socio-Economics Group, Tänikon 1, 8356 Ettenhausen, Switzerland.
This article describes data from an online survey conducted with the Swiss public from the two biggest language regions (German and French) in Switzerland. The survey was conducted in February 2023. Participants were recruited through a professional panel provider and quotas were used for age, gender and language region.
View Article and Find Full Text PDFHeliyon
January 2025
Pharmaceutical Sciences and Technology Program, Faculty of Pharmaceutical Sciences, Chulalongkorn University, Bangkok, 10330, Thailand.
Hyaluronic acid (HA) is a popular surface modifier in targeted cancer delivery due to its receptor-binding abilities. However, HA alone faces limitations in lipid solubility, biocompatibility, and cell internalization, making it less effective as a standalone delivery system. This comprehensive study aimed to explore a dynamic landscape of complexation in HA-based nanoparticles in cancer therapy, examining diverse aspects from influential modifiers to emerging trends in cancer diagnostics.
View Article and Find Full Text PDFLangmuir
January 2025
Department of Chemical and Materials Engineering, University of Kentucky, Lexington, Kentucky 40506, United States.
Antibiofouling peptide materials prevent the nonspecific adsorption of proteins on devices, enabling them to perform their designed functions as desired in complex biological environments. Due to their importance, research on antibiofouling peptide materials has been one of the central subjects of interfacial engineering. However, only a few antibiofouling peptide sequences have been developed.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!