Machine learning models have found numerous successful applications in computational drug discovery. A large body of these models represents molecules as sequences since molecular sequences are easily available, simple, and informative. The sequence-based models often segment molecular sequences into pieces called chemical words, analogous to the words that make up sentences in human languages, and then apply advanced natural language processing techniques for tasks such as de novo drug design, property prediction, and binding affinity prediction. However, the chemical characteristics and significance of these building blocks, chemical words, remain unexplored. To address this gap, we employ data-driven SMILES tokenization techniques such as Byte Pair Encoding, WordPiece, and Unigram to identify chemical words and compare the resulting vocabularies. To understand the chemical significance of these words, we build a language-inspired pipeline that treats high affinity ligands of protein targets as documents and selects key chemical words making up those ligands based on tf-idf weighting. The experiments on multiple protein-ligand affinity datasets show that despite differences in words, lengths, and validity among the vocabularies generated by different subword tokenization algorithms, the identified key chemical words exhibit similarity. Further, we conduct case studies on a number of target to analyze the impact of key chemical words on binding. We find that these key chemical words are specific to protein targets and correspond to known pharmacophores and functional groups. Our approach elucidates chemical properties of the words identified by machine learning models and can be used in drug discovery studies to determine significant chemical moieties.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1002/minf.202300249 | DOI Listing |
J Am Chem Soc
January 2025
Department of Chemistry, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 02841, South Korea.
Epoxides are versatile chemical intermediates that are used in the manufacture of diversified industrial products. For decades, thermochemical conversion has long been employed as the primary synthetic route. However, it has several drawbacks, such as harsh and explosive operating conditions, as well as a significant greenhouse gas emissions problem.
View Article and Find Full Text PDFJ Am Chem Soc
January 2025
State Key Laboratory of Medicinal Chemical Biology, Frontiers Science Centre for New Organic Matter, Tianjin Key Laboratory of Biosensing and Molecular Recognition, Research Centre for Analytical Sciences, College of Chemistry, School of Medicine and Frontiers Science Center for Cell Responses, Nankai University, Tianjin 300071, P. R. China.
Carbon monoxide (CO) gas therapy, as an emerging therapeutic strategy, is promising in tumor treatment. However, the development of a red or near-infrared light-driven efficient CO release strategy is still challenging due to the limited physicochemical characteristics of the photoactivated carbon monoxide-releasing molecules (photoCORMs). Here, we discovered a novel photorelease CO mechanism that involved dual pathways of CO release via photosensitization.
View Article and Find Full Text PDFACS Nano
January 2025
Battery and Electrochemistry Laboratory (BELLA), Institute of Nanotechnology, Karlsruhe Institute of Technology (KIT), Kaiserstr. 12, Karlsruhe 76131, Germany.
Improving interfacial stability between cathode active material (CAM) and solid electrolyte (SE) is vital for developing high-performance all-solid-state batteries (ASSBs), with compatibility issues among the cell components representing a major challenge. CAM surface coating with a chemically inert ion conductor is a promising approach to suppress side reactions occurring at the cathode interfaces. Another strategy to mitigate mechanical degradation involves utilizing single-crystalline particle morphologies.
View Article and Find Full Text PDFACS Appl Mater Interfaces
January 2025
CAS Key Laboratory of Green Process and Engineering, Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, China.
Maintaining human body temperature in both high and low-temperature environments is fundamental to human survival, necessitating high-performance thermal insulation materials to prevent heat exchange with the external environment. Currently, most fibrous thermal insulation materials are characterized by large weight, suboptimal thermal insulation, and inferior mechanical and waterproof performance, thereby limiting their effectiveness in providing thermal protection for the human body. In this study, lightweight, waterproof, mechanically robust, and thermal insulating polyamide-imide (PAI) grooved micro/nanofibrous aerogels were efficiently and directly assembled by electrospinning.
View Article and Find Full Text PDFAcc Chem Res
January 2025
Department of Chemistry, Shanghai Key Laboratory of Catalysis and Innovative Materials, Center of Chemistry for Energy Materials Shanghai, Fudan University, Shanghai 200433, PR China.
ConspectusZinc metal batteries (ZMBs) appear to be promising candidates to replace lithium-ion batteries owing to their higher safety and lower cost. Moreover, natural reserves of Zn are abundant, being approximately 300 times greater than those of Li. However, there are some typical issues impeding the wide application of ZMBs.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!