While gas chromatography mass spectrometry (GC-MS) has long been used to identify compounds in complex mixtures, this process is often subjective and time-consuming and leaves a large fraction of seemingly good-quality spectra unidentified. In this work, we describe a set of new mass spectral library-based methods to assist compound identification in complex mixtures. These methods employ mass spectral uniqueness and compound ubiquity of library entries alongside noise reduction and automated comparison of retention indices to library compounds. As a test data set, we used a publicly available electron ionization mass spectrometry data set consisting of 4833 spectra of particulate organic compounds emitted by combustion of wildland fuels. In the present work, spectra in this data set were first identified using the NIST 2023 EI-MS Library and associated batch process identification software (NIST MS PepSearch) using retention-index corrected Identity Search scoring. Resulting identifications and related information were then employed to parametrize other factors that correlate with identification. A method for identifying compounds absent from but related to those present in mass spectral libraries using the Hybrid Similarity Search is illustrated. Nevertheless, some 90% of the spectra remain unidentified. Through comparison of unidentified to identified mass spectra in this data set, a new simple measure, namely median relative abundance, was developed for evaluating the likelihood of identification.

Download full-text PDF

Source
http://dx.doi.org/10.1021/jasms.4c00451DOI Listing

Publication Analysis

Top Keywords

data set
16
mass spectral
12
library-based methods
8
compound identification
8
mass spectrometry
8
complex mixtures
8
spectra data
8
mass
6
identification
5
spectra
5

Similar Publications

Genome assembly of the grassland caterpillar Gynaephora qinghaiensis.

Sci Data

January 2025

State Key Laboratory of Rice Biology, Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, 310058, China.

The grassland caterpillars are the most damaging insect pests to the alpine meadow of the Qinghai-Tibetan Plateau in China. In this study, we present a genome assembly of one grassland caterpillar Gynaephora qinghaiensis by using Oxford Nanopore long-read and BGI short-read sequencing. The genome assembly of 861.

View Article and Find Full Text PDF

The question of what processes can take place without conscious awareness has generated extensive research. Yet there is still no consensus regarding the extent and scope of unconscious processing, and past research abounds with conflicting results. A possible reason for this lack of consensus is the diversity of methods in the field, as the methodological choices might influence the results.

View Article and Find Full Text PDF

Functional near-infrared spectroscopy (fNIRS) is an increasingly popular neuroimaging technique that measures cortical hemodynamic activity in a non-invasive and portable fashion. Although the fNIRS community has been successful in disseminating open-source processing tools and a standard file format (SNIRF), reproducible research and sharing of fNIRS data amongst researchers has been hindered by a lack of standards and clarity over how study data should be organized and stored. This problem is not new in neuroimaging, and it became evident years ago with the proliferation of publicly available neuroimaging datasets.

View Article and Find Full Text PDF

A global dataset of freshwater fish trophic interactions.

Sci Data

January 2025

University of South Dakota, Department of Biology, Vermillion, SD, 57069, USA.

Freshwater management and research frequently rely on trophic data to manage freshwater fishes, yet it is difficult to perform a simple search of dietary information for any one species. FishBase represents the largest effort to organize freshwater dietary data into a singular, navigable dataset. Nonetheless, FishBase excludes a large portion of the ecological literature because it was developed before the creation of most modern scientific search engines.

View Article and Find Full Text PDF

Multiple Myeloma (MM) is a cytogenetically heterogeneous clonal plasma cell proliferative disease whose diagnosis is supported by analyses on histological slides of bone marrow aspirate. In summary, experts use a labor-intensive methodology to compute the ratio between plasma cells and non-plasma cells. Therefore, the key aspect of the methodology is identifying these cells, which relies on the experts' attention and experience.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!