Publications by authors named "Shipei Xing"

Polycyclic aromatic hydrocarbons (PAHs) are pervasive environmental pollutants with significant health risks due to their carcinogenic, mutagenic, and teratogenic properties. Traditional methods for PAH identification, primarily relying on gas chromatography-mass spectrometry (GC-MS), utilize spectral library searches together with other techniques, such as mass defect analysis. However, these methods are limited by incomplete spectral libraries and a high false positive rate.

View Article and Find Full Text PDF

Full-scan mass spectrometry (MS) data from both liquid chromatography (LC) and MS imaging capture multiple ion forms, including their in-source fragments. Here we leverage such fragments to structurally annotate full-scan data from LC-MS or MS imaging by matching against peak intensity scaled tandem MS spectral libraries using precursor-tolerant reverse match scoring. Applied to inflammatory bowel disease and imaging datasets, we show the approach facilitates re-analyses of data in public repositories.

View Article and Find Full Text PDF

Despite extensive efforts, extracting information on medication exposure from clinical records remains challenging. To complement this approach, we developed the tandem mass spectrometry (MS/MS) based GNPS Drug Library. This resource integrates MS/MS data for drugs and their metabolites/analogs with controlled vocabularies on exposure sources, pharmacologic classes, therapeutic indications, and mechanisms of action.

View Article and Find Full Text PDF

Feature-based molecular networking (FBMN) is a popular analysis approach for liquid chromatography-tandem mass spectrometry-based non-targeted metabolomics data. While processing liquid chromatography-tandem mass spectrometry data through FBMN is fairly streamlined, downstream data handling and statistical interrogation are often a key bottleneck. Especially users new to statistical analysis struggle to effectively handle and analyze complex data matrices.

View Article and Find Full Text PDF
Article Synopsis
  • Understanding plant metabolites across the plant kingdom is challenging due to their vast diversity.
  • Researchers created the plantMASST reference database with data from 19,075 plant extracts, covering 246 botanical families, 1,469 genera, and 2,793 species.
  • This database enhances research on plant molecules, supporting drug discovery, biosynthesis, taxonomy, and ecology related to herbivore interactions.*
View Article and Find Full Text PDF

The repertoire of modifications to bile acids and related steroidal lipids by host and microbial metabolism remains incompletely characterized. To address this knowledge gap, we created a reusable resource of tandem mass spectrometry (MS/MS) spectra by filtering 1.2 billion publicly available MS/MS spectra for bile-acid-selective ion patterns.

View Article and Find Full Text PDF

High-resolution mass spectrometry (HRMS) is a prominent analytical tool that characterizes chlorinated disinfection byproducts (Cl-DBPs) in an unbiased manner. Due to the diversity of chemicals, complex background signals, and the inherent analytical fluctuations of HRMS, conventional isotopic pattern (Cl/Cl), mass defect, and direct molecular formula (MF) prediction are insufficient for accurate recognition of the diverse Cl-DBPs in real environmental samples. This work proposes a novel strategy to recognize Cl-containing chemicals based on machine learning.

View Article and Find Full Text PDF

Despite the increasing availability of tandem mass spectrometry (MS/MS) community spectral libraries for untargeted metabolomics over the past decade, the majority of acquired MS/MS spectra remain uninterpreted. To further aid in interpreting unannotated spectra, we created a nearest neighbor suspect spectral library, consisting of 87,916 annotated MS/MS spectra derived from hundreds of millions of MS/MS spectra originating from published untargeted metabolomics experiments. Entries in this library, or "suspects," were derived from unannotated spectra that could be linked in a molecular network to an annotated spectrum.

View Article and Find Full Text PDF

Cholesterol is a critical growth substrate for (Mtb) during infection, and the cholesterol catabolic pathway has been targeted for the development of new antimycobacterial agents. A key metabolite in cholesterol catabolism is 3aα-H-4α(3'-propanoate)-7aβ-methylhexahydro-1,5-indanedione (HIP). Many of the HIP metabolites are acyl-coenzyme A (CoA) thioesters, whose accumulation in deletion mutants can cause cholesterol-mediated toxicity.

View Article and Find Full Text PDF

The combination of hydrogen/deuterium (H/D) formaldehyde-based isotopic methyl labeling with solid-phase extraction and high-performance liquid chromatography-high resolution mass spectrometry (HPLC-HRMS) is a powerful analytical solution for nontargeted analysis of trace-level amino-containing chemicals in water samples. Given the huge amount of chemical information generated in HPLC-HRMS analysis, identifying all possible H/D-labeled amino chemicals presents a significant challenge in data processing. To address this, we designed a streamlined data processing pipeline that can automatically extract H/D-labeled amino chemicals from the raw HPLC-HRMS data with high accuracy and efficiency.

View Article and Find Full Text PDF

Colorectal cancer (CRC) is driven by genomic alterations in concert with dietary influences, with the gut microbiome implicated as an effector in disease development and progression. While meta-analyses have provided mechanistic insight into patients with CRC, study heterogeneity has limited causal associations. Using multi-omics studies on genetically controlled cohorts of mice, we identify diet as the major driver of microbial and metabolomic differences, with reductions in α diversity and widespread changes in cecal metabolites seen in high-fat diet (HFD)-fed mice.

View Article and Find Full Text PDF

The purity of tandem mass spectrometry (MS/MS) is essential to MS/MS-based metabolite annotation and unknown exploration. This work presents a approach to cleaning chimeric MS/MS spectra generated in liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based metabolomics. The assumption is that true fragments and their precursors are well correlated across the samples in a study, while false or contamination fragments are rather independent.

View Article and Find Full Text PDF

A substantial fraction of metabolic features remains undetermined in mass spectrometry (MS)-based metabolomics, and molecular formula annotation is the starting point for unraveling their chemical identities. Here we present bottom-up tandem MS (MS/MS) interrogation, a method for de novo formula annotation. Our approach prioritizes MS/MS-explainable formula candidates, implements machine-learned ranking and offers false discovery rate estimation.

View Article and Find Full Text PDF

Background: Due to many substances in the human exposome, there is a dearth of exposure and toxicity information available to assess potential health risks. Quantification of all trace organics in the biological fluids seems impossible and costly, regardless of the high individual exposure variability. We hypothesized that the blood concentration () of organic pollutants could be predicted via their exposure and chemical properties.

View Article and Find Full Text PDF

Advancements in computer science and software engineering have greatly facilitated mass spectrometry (MS)-based untargeted metabolomics. Nowadays, gigabytes of metabolomics data are routinely generated from MS platforms, containing condensed structural and quantitative information from thousands of metabolites. Manual data processing is almost impossible due to the large data size.

View Article and Find Full Text PDF

Interrelating small molecules according to their aligned fragmentation spectra is central to tandem mass spectrometry-based untargeted metabolomics. Current alignment algorithms do not provide statistical significance and compounds that have multiple delocalized structural differences and therefore often fail to have their fragment ions aligned. Here we align fragmentation spectra with both statistical significance and allowance for multiple chemical differences using Significant Interrelation of MS/MS Ions via Laplacian Embedding (SIMILE).

View Article and Find Full Text PDF

Extracting metabolic features from liquid chromatography-mass spectrometry (LC-MS) data has been a long-standing bioinformatic challenge in untargeted metabolomics. Conventional feature extraction algorithms fail to recognize features with low signal intensities, poor chromatographic peak shapes, or those that do not fit the parameter settings. This problem also poses a challenge for MS-based exposome studies, as low-abundant metabolic or exposomic features cannot be automatically recognized from raw data.

View Article and Find Full Text PDF

Collision-induced dissociation (CID) is a common fragmentation strategy in tandem mass spectrometry (MS) analysis. A conventional understanding is that fragment ions generated in low-energy CID should follow the even-electron rule. As such, (de)protonated ([M+H]/[M-H]) or even-electron precursor ions should follow heterolytic cleavages and predominately generate even-electron fragment ions with very few radical fragment ions (RFIs).

View Article and Find Full Text PDF

Extracting metabolic features from liquid chromatography-mass spectrometry (LC-MS) data relies on the recognition of extracted ion chromatogram (EIC) peak shapes using peak picking algorithms. Unfortunately, all peak picking algorithms present a significant drawback of generating a problematic number of false positives. In this work, we take advantage of deep learning technology to develop a convolutional neural network (CNN)-based program that can automatically recognize metabolic features with poor EIC shapes, which are of low feature fidelity and more likely to be false.

View Article and Find Full Text PDF

In-source fragmentation (ISF) is a naturally occurring phenomenon during electrospray ionization (ESI) in liquid chromatography-mass spectrometry (LC-MS) analysis. ISF leads to false metabolite annotation in untargeted metabolomics, prompting misinterpretation of the underlying biological mechanisms. Conventional metabolomic data cleaning mainly focuses on the annotation of adducts and isotopes, and the recognition of ISF features is mainly based on common neutral losses and the LC coelution pattern.

View Article and Find Full Text PDF

Hair is a unique biological matrix that adsorbs short-term exposures (e. g., environmental contaminants and personal care products) on its surface and also embeds endogenous metabolites and long-term exposures in its matrix.

View Article and Find Full Text PDF

Background: Due to the ubiquitous use of chemicals in modern society, humans are increasingly exposed to thousands of chemicals that contribute to a major portion of the human exposome. Should a comprehensive and risk-based human exposome database be created, it would be conducive to the rapid progress of human exposomics research. In addition, once a xenobiotic is biotransformed with distinct half-lives upon exposure, monitoring the parent compounds alone may not reflect the actual human exposure.

View Article and Find Full Text PDF

Despite the vast amount of metabolic information that can be captured in untargeted metabolomics, many biological applications are looking for a biology-driven metabolomics platform that targets a set of metabolites that are relevant to the given biological question. Steroids are a class of important molecules that play critical roles in many physiological systems and diseases. Besides known steroids, there are a large number of unknown steroids that have not been reported in the literature.

View Article and Find Full Text PDF

Tandem mass spectral (MS/MS) data in liquid chromatography-tandem mass spectrometry (LC-MS/MS) analysis are often contaminated as the selection of precursor ions is based on a low-resolution quadrupole mass filter. In this work, we developed a strategy to differentiate contamination fragment ions (CFIs) from true fragment ions (TFIs) in an MS/MS spectrum. The rationale is that TFIs should coelute with their parent ions, but CFIs should not.

View Article and Find Full Text PDF

Existing data acquisition modes such as full-scan, data-dependent (DDA), and data-independent acquisition (DIA) often present limited capabilities in capturing metabolic information in liquid chromatography-mass spectrometry (LC-MS)-based metabolomics. In this work, we proposed a novel metabolomic data acquisition workflow that combines DDA and DIA analyses to achieve better metabolomic data quality, including enhanced metabolome coverage, tandem mass spectrometry (MS) coverage, and MS quality. This workflow, named data-dependent-assisted data-independent acquisition (DaDIA), performs untargeted metabolomic analysis of individual biological samples using DIA mode and the pooled quality control (QC) samples using DDA mode.

View Article and Find Full Text PDF