GT-NMR: a novel graph transformer-based approach for accurate prediction of NMR chemical shifts.

J Cheminform

Fujian Provincial Key Laboratory for Theoretical and Computational Chemistry, Departmental of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, People's Republic of China.

Published: November 2024

In this work, inspired by the graph transformer, we presented an improved protocol, termed GT-NMR, which integrates 2D molecular graph representation with Transformer architecture, for accurate yet efficient prediction of NMR chemical shifts. The effectiveness of the GT-NMR was thoroughly examined with the standard nmrshiftdb2 dataset, 37 natural products and structural elucidation of 11 pairs of natural products. Systematical analysis affirms that GT-NMR outperforms traditional graph-based methods in all aspects, achieving state-of-the-art performance, with the mean absolute error of 0.158 and 1.189 ppm in predicting H and C NMR chemical shifts, respectively, for the standard nmrshiftdb2 dataset. Further scrutiny of its practical applications indicates that GT-NMR's efficacy is closely tied to molecular complexity, as quantified by the size-normalized spatial score (nSPS). For relatively simple molecules (nSPS < = 27.71), GT-NMR performs comparably to the best density functional while its effectiveness significantly diminishes with complex molecules characterized by higher nSPS values (nSPS > = 38.42). This trend is consistent across other graph-based NMR chemical shift prediction methods as well. Therefore, while employing GT-NMR or other graph-based methods for the rapid and routine prediction of NMR chemical shifts, it is advisable to utilize nSPS to assess their suitability. The source codes and trained model of GT-NMR are publicly available at GitHub.Scientific contributionGT-NMR, which combines the 2D molecular graph representation with the Transformer architecture, was implemented for the first time to predict atom-level NMR chemical shifts, achieving state-of-the-art performance. More importantly, the reliability of the GT-NMR and graph-based methods was assessed for the first time in terms of molecular complexity, as quantified by the size-normalized spacial score (nSPS). Systematical scrutiny demonstrated that GT-NMR offer a valuable way for routine application in structural screening and elucidation of relatively simple molecules.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11590296PMC
http://dx.doi.org/10.1186/s13321-024-00927-9DOI Listing

Publication Analysis

Top Keywords

nmr chemical
24
chemical shifts
20
prediction nmr
12
graph-based methods
12
gt-nmr
9
molecular graph
8
graph representation
8
representation transformer
8
transformer architecture
8
standard nmrshiftdb2
8

Similar Publications

Characteristics of In Vivo Lesion Formation With a Temperature-Controlled Diamond-Tip Radiofrequency Ablation Catheter in the Ventricle: A Preclinical Model.

Circ Arrhythm Electrophysiol

January 2025

Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN (T.H., M.E.R., O.Y., G.N.K., N.O., T.K., L.N., D.L.P., K.C.S.).

Background: Power-controlled radiofrequency ablation with irrigated-tip catheters has been the norm for ventricular ablation for almost 2 decades. New catheter technology has recently integrated more accurate tissue temperature sensing enabling temperature-controlled irrigated ablation. We aimed to investigate the in vivo ablation parameters and lesion formation characteristics in ventricular myocardium using a novel temperature-controlled radiofrequency catheter.

View Article and Find Full Text PDF

Introduction: The gut microbiota plays a pivotal role in influencing host health, through the production of metabolites and other key signalling molecules. While the impact of specific metabolites or taxa on host cells is well-documented, the broader impact of a disrupted microbiota on immune homeostasis is less understood, which is particularly important in the context of the increasing overuse of antibiotics.

Methods: Female C57BL/6 mice were gavaged twice daily for four weeks with Vancomycin, Polymyxin B, or PBS (control).

View Article and Find Full Text PDF

The diaspore-type crystalline structure is historically well-known in mineralogy, but it has also been widely studied for various applications in the field of catalysis, electrocatalysis, and batteries. However, once two anions of similar ionic size but different electronegativity, such as F and O or more precisely OH, are combined, the knowledge of the location of these two anions is of paramount importance to understand the chemical properties in relation with the generation of hydrogen bonds. Coprecipitation and hydrothermal routes were used to prepare hydroxide-fluorides that crystallize all in an orthorhombic structure with four formula units per cell.

View Article and Find Full Text PDF

Glutaric anhydride esterification promotes wheat starch/glutein composite gel interaction: Formation, characterization, and oleogel applications.

Food Res Int

February 2025

Engineering Research Center of Grain and Oil Functionalized Processing in Universities of Shaanxi Province, College of Food Science and Engineering, Northwest A&F University, 22 Xinong Road, Yangling 712100, Shaanxi, PR China. Electronic address:

This study constructed a composite system with different ratios (100:0, 95:5, 90:10, and 80:20) of glutein compounded with various esterified starch (3 % and 6 %). The results demonstrated that the esterification process enhanced the viscosity of the starch gel system. Furthermore, the optimal esterification level (3 %) facilitated the formation of a dense composite gel network, as observed through microstructure observation.

View Article and Find Full Text PDF

Efficient and accurate determination of the degree of substitution of cellulose acetate using ATR-FTIR spectroscopy and machine learning.

Sci Rep

January 2025

Institute of Biological and Chemical Systems - Functional Molecular Systems (IBCS-FMS), Karlsruhe Institute of Technology (KIT), Karlsruhe, 76344, Germany.

Multiple linear regression models were trained to predict the degree of substitution (DS) of cellulose acetate based on raw infrared (IR) spectroscopic data. A repeated k-fold cross validation ensured unbiased assessment of model accuracy. Using the DS obtained from H NMR data as reference, the machine learning model achieved a mean absolute error (MAE) of 0.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!