The majority of tandem mass spectrometry (MS/MS) spectra in untargeted metabolomics and exposomics studies lack any annotation. Our deep learning framework, Integrated Data Science Laboratory for Metabolomics and Exposomics-Mass INTerpreter (IDSL_MINT) can translate MS/MS spectra into molecular fingerprint descriptors. IDSL_MINT allows users to leverage the power of the transformer model for mass spectrometry data, similar to the large language models.
View Article and Find Full Text PDFPoor chemical annotation of high-resolution mass spectrometry data limits applications of untargeted metabolomics datasets. Our new software, the Integrated Data Science Laboratory for Metabolomics and Exposomics─Composite Spectra Analysis (IDSL.CSA) R package, generates composite mass spectra libraries from MS1-only data, enabling the chemical annotation of high-resolution mass spectrometry coupled with liquid chromatography peaks regardless of the availability of MS2 fragmentation spectra.
View Article and Find Full Text PDFPoor chemical annotation of high-resolution mass spectrometry data limit applications of untargeted metabolomics datasets. Our new software, the Integrated Data Science Laboratory for Metabolomics and Exposomics - Composite Spectra Analysis (IDSL.CSA) R package, generates composite mass spectra libraries from MS1-only data, enabling the chemical annotation of LC/HRMS peaks regardless of the availability of MS2 fragmentation spectra.
View Article and Find Full Text PDFThe bioaccumulation and biomagnification of perfluoroalkyl substances (PFAS) in the Lake Erie food web was investigated by analyzing surface water and biological samples including 10 taxa of fish species, 2 taxa of benthos and zooplankton. The carbon (δC) and nitrogen (δN) isotopic composition and fatty acids profiles of biological samples were used to evaluate the food web structure and assess the biomagnification of PFAS. Perfluorooctane sulfonate (PFOS) dominated the total PFAS (ΣPFAS) concentration (50-90% of ΣPFAS concentration), followed by C9-C11 perfluorinated carboxylic acids (PFCAs).
View Article and Find Full Text PDFUntargeted liquid chromatography/high-resolution mass spectrometry (LC/HRMS) assays in metabolomics and exposomics aim to characterize the small molecule chemical space in a biospecimen. To gain maximum biological insights from these data sets, LC/HRMS peaks should be annotated with chemical and functional information including molecular formula, structure, chemical class, and metabolic pathways. Among these, molecular formulas may be assigned to LC/HRMS peaks through matching theoretical and observed isotopic profiles (MS1) of the underlying ionized compound.
View Article and Find Full Text PDFGenerating comprehensive and high-fidelity metabolomics data matrices from LC/HRMS data remains to be extremely challenging for population-scale large studies ( > 200). Here, we present a new data processing pipeline, the Intrinsic Peak Analysis (IDSL.IPA) R package (https://ipa.
View Article and Find Full Text PDFInter-chemical correlations in metabolomics and exposomics datasets provide valuable information for studying relationships among chemicals reported for human specimens. With an increase in the number of compounds for these datasets, a network graph analysis and visualization of the correlation structure is difficult to interpret. We have developed the Chemical Correlation Database (CCDB), as a systematic catalogue of inter-chemical correlation in publicly available metabolomics and exposomics studies.
View Article and Find Full Text PDFPolyfluoroalkyl substances (PFAS) are a group of fluorinated organic chemicals that have been produced for industrial and commercial application since the 1950s. PFAS are highly persistent and ubiquitous in water, sediment, and biota. Toxic effects of PFAS on humans and the ecosystem have increased scientific and public concern.
View Article and Find Full Text PDFAn untargeted chemical analysis of bio-fluids provides semi-quantitative data for thousands of chemicals for expanding our understanding about relationships among metabolic pathways, diseases, phenotypes and exposures. During the processing of mass spectral and chromatography data, various signal thresholds are used to control the number of peaks in the final data matrix that is used for statistical analyses. However, commonly used stringent thresholds generate constrained data matrices which may under-represent the detected chemical space, leading to missed biological insights in the exposome research.
View Article and Find Full Text PDFSport fish fillets and human sera (fish consumers) were collected in the Lake Superior and Lake Michigan basin and screened for novel contaminants using the isotopic profile deconvoluted chromatogram (IPDC) algorithm. The IPDC algorithm was extended beyond traditional Cl/Br filters to detect additional potential bioaccumulative and toxic (PBT) such as perfluoroalkyl substances (PFAS). The IPDC algorithm screened for approximately 13.
View Article and Find Full Text PDFLegacy halogenated contaminants have been monitored in the Great Lakes for decades, but there are many additional unknown halogenated contaminants potentially affecting the Great Lakes ecosystem. To address this concern, lake trout () were collected in 2005/2006 and 2015/2016 from each lake and screened for previously unidentified compounds. The isotopic profile deconvoluted chromatogram algorithm was used to isolate unknown halogenated components using high-resolution mass spectrometry data files generated by an atmospheric pressure gas chromatography-quadrupole time-of-flight mass spectrometer operated in positive and negative modes.
View Article and Find Full Text PDFAn isotopic profile matching algorithm, the isotopic profile deconvoluted chromatogram (IPDC), was developed to screen for a wide variety of organic compounds in high-resolution mass spectrometry (HRMS) data acquired from instruments with resolution power as low as 22 000 fwhm. The algorithm initiates the screening process by generating a series of C/Br/Cl/S isotopic patterns consistent with the profiles of approximately 3 million molecular formulas for compounds with potentially persistent, bioaccumulative, and toxic (PBT) properties. To evaluate this algorithm, HRMS data were screened using these seed profiles to isolate relevant chlorinated and/or brominated compounds.
View Article and Find Full Text PDFByproducts produced when treating perfluorooctanoic acid (PFOA) and perfluorooctanesulfonate (PFOS) in water using a plasma treatment process intentionally operated to treat these compounds slowly to allow for byproduct accumulation were quantified. Several linear chain perfluoroalkyl carboxylic acids (PFCAs) (C4 to C7) were identified as byproducts of both PFOA and PFOS treatment. PFOA, perfluorohexanesulfonate (PFHxS), and perfluorobutanesulfonate (PFBS) were also found to be byproducts from PFOS degradation.
View Article and Find Full Text PDFA versatile screening algorithm capable of efficiently searching liquid chromatographic/mass spectrometric data for unknown compounds has been developed using a combination of open source and generic computing software packages. The script was used to search for select novel polyfluorinated contaminants in Great Lakes fish. However, the framework is applicable whenever full-scan, high-resolution mass spectral and chromatographic data are collected.
View Article and Find Full Text PDF