Tandem mass spectrometry (MS/MS) shows great promise in the research of metabolomics, providing an abundance of information on compounds. Due to the rapid development of mass spectrometric techniques, a large number of MS/MS spectral data sets have been produced from different experimental environments. The massive data brings great challenges into the spectral analysis including compound identification and spectra clustering. The core challenge in MS/MS spectral analysis is how to describe a spectrum more quantitatively and effectively. Recently, emerging deep-learning-based technologies have brought new opportunities to handle this challenge in which high-quality descriptions of MS/MS spectra can be obtained. In this study, we propose a novel contrastive learning-based method for the representation of MS/MS spectra, called CLERMS, which is based on transformer architecture. Specifically, an optimized model architecture equipped with a sinusoidal embedder and a novel loss function composed of InfoNCE loss and MSE loss has been proposed for the attainment of good embedding from the peak information and the metadata. We evaluate our method using a GNPS data set, and the results demonstrate that the learned embedding can not only distinguish spectra from different compounds but also reveal the structural similarity between them. Additionally, the comparison between our method and other methods on the performance of compound identification and spectra clustering shows that our method can achieve significantly better results.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1021/acs.analchem.3c00260 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!