QC-GNoMS: a Graph Neural Net for High Resolution Mass Spectra Prediction.

J Chem Inf Model

Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States.

Published: August 2024

Predicting the mass spectrum of a molecular ion is often accomplished via three generalized approaches: rules-based methods for bond breaking, deep learning, or quantum chemical (QC) modeling. Rules-based approaches are often limited by the conditions for different chemical subspaces and perform poorly under chemical regimes with few defined rules. QC modeling is theoretically robust but requires significant amounts of computational time to produce a spectrum for a given target. Among deep learning techniques, graph neural networks (GNNs) have performed better than previous work with fingerprint-based neural networks in mass spectra prediction. To explore this technique further, we investigate the effects of including quantum chemically derived information as edge features in the GNN to increase predictive accuracy. The models we investigated include categorical bond order, bond force constants derived from extended tight-binding (xTB) quantum chemistry, and acyclic bond dissociation energies. We evaluated these models against a control GNN with no edge features in the input graphs. Bond dissociation enthalpies yielded the best improvement with a cosine similarity score of 0.462 relative to the baseline model (0.437). In this work we also apply dynamic graph attention which improves performance on benchmark problems and supports the inclusion of edge features. Between implementations, we investigate the nature of the molecular embedding for spectra prediction and discuss the recognition of fragment topographies in distinct chemistries for further development in tandem mass spectrometry prediction.

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jcim.4c00446DOI Listing

Publication Analysis

Top Keywords

spectra prediction
12
edge features
12
graph neural
8
mass spectra
8
deep learning
8
neural networks
8
bond dissociation
8
bond
5
qc-gnoms graph
4
neural net
4

Similar Publications

Impeding linear calibration models from accurately predicting target sample analyte amounts are the target sample-wise deviations in measurement profiles (e.g., spectra) relative to calibration samples.

View Article and Find Full Text PDF

Hydroxysilylene (HSi-OH) in the gas phase.

J Chem Phys

January 2025

Ideal Vacuum Products, LLC, 5910 Midway Park Blvd. NE, Albuquerque, New Mexico 87109, USA.

The hydroxysilylene (HSiOH) molecule has been spectroscopically identified in the gas phase for the first time. This highly reactive species was produced in a twin electric discharge jet using separate precursor streams of 16O2/18O2 and Si2H6/Si2D6, both diluted in high pressure argon. The strongest and most stable laser induced fluorescence (LIF) signals were obtained by applying an electric discharge to each of the precursor streams and then merging the discharge products just prior to expansion into vacuum.

View Article and Find Full Text PDF

Adsorption isotherms in roasted specialty coffee ( L.): Dataset and statistical tools for optimizing storage conditions and enhancing shelf life.

Data Brief

February 2025

Centro Surcolombiano de Investigación en Café (CESURCAFÉ), Departamento de Ingeniería Agrícola, Universidad Surcolombiana, Neiva-Huila 410001, Colombia.

This work presents a comprehensive dataset of adsorption isotherms and infrared spectral data for roasted specialty coffee ( L.). The dataset includes adsorption isotherms for whole roasted beans and ground coffee at medium (850 µm) and fine (600 µm) particle sizes.

View Article and Find Full Text PDF

Mid-infrared spectra of dried and roasted cocoa ( L.): A dataset for machine learning-based classification of cocoa varieties and prediction of theobromine and caffeine content.

Data Brief

February 2025

Centro Surcolombiano de Investigación en Café (CESURCAFÉ), Departamento de Ingeniería Agrícola, Universidad Surcolombiana, Neiva-Huila 410001, Colombia.

This paper presents a comprehensive dataset of mid-infrared spectra for dried and roasted cocoa beans ( L.), along with their corresponding theobromine and caffeine content. Infrared data were acquired using Attenuated Total Reflectance-Fourier Transform Infrared (ATR-FTIR) spectroscopy, while High-Performance Liquid Chromatography (HPLC) was employed to accurately quantify theobromine and caffeine in the dried cocoa beans.

View Article and Find Full Text PDF

Within the context of polypropylene recycling by dissolution, the potential degradation of polypropylene in solution has been investigated using in situ NIR and Raman spectroscopy. Pure polypropylene, completely free of additives, and commercial polypropylene, low in additives, are degraded on purpose under different conditions. Genetic algorithm combined with partial least squares (GA-PLS) models have been built based on near-infrared (NIR) spectra, and partial least squares (PLS) models based on Raman spectra, to predict the mass average molar mass and the chain-scission rate, respectively, during the degradation process.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!