MetGem Software for the Generation of Molecular Networks Based on the t-SNE Algorithm.

Anal Chem

Institut de Chimie des Substances Naturelles, CNRS UPR 2301, Université Paris-Sud, Université Paris-Saclay, Avenue de la Terrasse , 91198 Gif-sur-Yvette , France.

Published: December 2018

Molecular networking (MN) is becoming a standard bioinformatics tool in the metabolomic community. Its paradigm is based on the observation that compounds with a high degree of chemical similarity share comparable MS fragmentation pathways. To afford a clear separation between MS spectral clusters, only the most relevant similarity scores are selected using dedicated filtering steps requiring time-consuming parameter optimization. Depending on the filtering values selected, some scores are arbitrarily deleted and a part of the information is ignored. The problem of creating a reliable representation of MS spectra data sets can be solved using algorithms developed for dimensionality reduction and pattern recognition purposes, such as t-distributed stochastic neighbor embedding (t-SNE). This multivariate embedding method pays particular attention to local details by using nonlinear outputs to represent the entire data space. To overcome the limitations inherent to the GNPS workflow and the networking architecture, we developed MetGem. Our software allows the parallel investigation of two complementary representations of the raw data set, one based on a classic GNPS-style MN and another based on the t-SNE algorithm. The t-SNE graph preserves the interactions between related groups of spectra, while the MN output allows an unambiguous separation of clusters. Additionally, almost all parameters can be tuned in real time, and new networks can be generated within a few seconds for small data sets. With the development of this unified interface ( https://metgem.github.io ), we fulfilled the need for a dedicated, user-friendly, local software for MS comparison and spectral network generation.

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.analchem.8b03099DOI Listing

Publication Analysis

Top Keywords

metgem software
8
based t-sne
8
t-sne algorithm
8
data sets
8
software generation
4
generation molecular
4
molecular networks
4
based
4
networks based
4
t-sne
4

Similar Publications

The recent emergence of new synthetic opioids (NSOs) compounds in the illicit market is increasingly related to fatal cases. Identification and medical care of NSO intoxication cases are challenging, particularly due to high frequency of new products and extensive metabolism. As the study of NSO metabolism is crucial for the identification of these drugs in cases of intoxication, we aimed to investigate the metabolism of the piperazine NSO AP-237 (= bucinnazine).

View Article and Find Full Text PDF

Dataset on metabolome dimorphism in different organs of mature prawn.

Data Brief

June 2023

UNIHAVRE, UMR-I 02 INERIS-URCA-ULHN SEBIO, FR CNRS 3730 Scale, F-76063 Le Havre Cedex, France.

The prawn exhibits a large distribution (occurring along the Northeastern Atlantic coast to the Mediterranean), and has thus been found suitable as model organism valuable for various ecotoxicological studies. However, little is still known about the potential input of its metabolome and particularly concerning a potential molecular sexual dimorphism observable in the different tissues of this organism. In an ecotoxicological point of view, inter-sex and inter-organ differences of the metabolomes may introduce analytical bias and impact the robustness of the analysis and its interpretation.

View Article and Find Full Text PDF

During the last two decades, MALDI-ToF mass spectrometry has become an efficient and widely-used tool for identifying clinical isolates. However, its use for classification and identification of environmental microorganisms remains limited by the lack of reference spectra in current databases. In addition, the interpretation of the classical dendrogram-based data representation is more difficult when the quantity of taxa or chemotaxa is larger, which implies problems of reproducibility between users.

View Article and Find Full Text PDF

The chemical diversity of biologically active fungal strains from 42 Colletotrichum, isolated from leaves of the tropical palm species Astrocaryum sciophilum collected in pristine forests of French Guiana, was investigated. The collection was first classified based on protein fingerprints acquired by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) correlated with cytotoxicity. Liquid chromatography coupled to high-resolution tandem mass spectrometry (LC-HRMS/MS) data from ethyl acetate extracts were acquired and processed to generate a massive molecular network (MN) using the MetGem software.

View Article and Find Full Text PDF

Generation of a Molecular Network from Electron Ionization Mass Spectrometry Data by Combining MZmine2 and MetGem Software.

Anal Chem

September 2019

Institut de Chimie des Substances Naturelles , CNRS UPR2301, Université Paris-Sud, Université Paris-Saclay, Avenue de la Terrasse , 91190 Gif-sur-Yvette , France.

Molecular networking (MN) allows one to organize tandem mass spectrometry (MS/MS) data by spectral similarities. Cosine-score used as a metric to calculate the distance between two spectra is based on peak lists containing fragments and neutral losses from MS/MS spectra. Until now, the workflow excluded the generation of the molecular network from electron ionization (EI) MS data as no selection of the putative parent ion is achieved when performing classical gas chromatography (GC)-EI-MS analysis.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!