While gas chromatography mass spectrometry (GC-MS) has long been used to identify compounds in complex mixtures, this process is often subjective and time-consuming and leaves a large fraction of seemingly good-quality spectra unidentified. In this work, we describe a set of new mass spectral library-based methods to assist compound identification in complex mixtures. These methods employ mass spectral uniqueness and compound ubiquity of library entries alongside noise reduction and automated comparison of retention indices to library compounds. As a test data set, we used a publicly available electron ionization mass spectrometry data set consisting of 4833 spectra of particulate organic compounds emitted by combustion of wildland fuels. In the present work, spectra in this data set were first identified using the NIST 2023 EI-MS Library and associated batch process identification software (NIST MS PepSearch) using retention-index corrected Identity Search scoring. Resulting identifications and related information were then employed to parametrize other factors that correlate with identification. A method for identifying compounds absent from but related to those present in mass spectral libraries using the Hybrid Similarity Search is illustrated. Nevertheless, some 90% of the spectra remain unidentified. Through comparison of unidentified to identified mass spectra in this data set, a new simple measure, namely median relative abundance, was developed for evaluating the likelihood of identification.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1021/jasms.4c00451 | DOI Listing |
Sci Data
January 2025
State Key Laboratory of Rice Biology, Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, 310058, China.
The grassland caterpillars are the most damaging insect pests to the alpine meadow of the Qinghai-Tibetan Plateau in China. In this study, we present a genome assembly of one grassland caterpillar Gynaephora qinghaiensis by using Oxford Nanopore long-read and BGI short-read sequencing. The genome assembly of 861.
View Article and Find Full Text PDFSci Data
January 2025
School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel.
The question of what processes can take place without conscious awareness has generated extensive research. Yet there is still no consensus regarding the extent and scope of unconscious processing, and past research abounds with conflicting results. A possible reason for this lack of consensus is the diversity of methods in the field, as the methodological choices might influence the results.
View Article and Find Full Text PDFSci Data
January 2025
Department of Engineering Technology, University of Houston, Houston, TX, USA.
Functional near-infrared spectroscopy (fNIRS) is an increasingly popular neuroimaging technique that measures cortical hemodynamic activity in a non-invasive and portable fashion. Although the fNIRS community has been successful in disseminating open-source processing tools and a standard file format (SNIRF), reproducible research and sharing of fNIRS data amongst researchers has been hindered by a lack of standards and clarity over how study data should be organized and stored. This problem is not new in neuroimaging, and it became evident years ago with the proliferation of publicly available neuroimaging datasets.
View Article and Find Full Text PDFSci Data
January 2025
University of South Dakota, Department of Biology, Vermillion, SD, 57069, USA.
Freshwater management and research frequently rely on trophic data to manage freshwater fishes, yet it is difficult to perform a simple search of dietary information for any one species. FishBase represents the largest effort to organize freshwater dietary data into a singular, navigable dataset. Nonetheless, FishBase excludes a large portion of the ecological literature because it was developed before the creation of most modern scientific search engines.
View Article and Find Full Text PDFSci Data
January 2025
Federal University of Bahia, Institute of Computing, Salvador, 40170-110, Brazil.
Multiple Myeloma (MM) is a cytogenetically heterogeneous clonal plasma cell proliferative disease whose diagnosis is supported by analyses on histological slides of bone marrow aspirate. In summary, experts use a labor-intensive methodology to compute the ratio between plasma cells and non-plasma cells. Therefore, the key aspect of the methodology is identifying these cells, which relies on the experts' attention and experience.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!