Evaluation and comparison of bioinformatic tools for the enrichment analysis of metabolomics data.

BMC Bioinformatics

Biomarkers & Nutrimetabolomics Laboratory, Nutrition, Food Science and Gastronomy Department, Food Technology Reference Net (XaRTA), Nutrition and Food Safety Research Institute (INSA-UB), Faculty of Pharmacy and Food Sciences, Pharmacy and Food Science Faculty, University of Barcelona, Barcelona, Spain.

Published: January 2018

Background: Bioinformatic tools for the enrichment of 'omics' datasets facilitate interpretation and understanding of data. To date few are suitable for metabolomics datasets. The main objective of this work is to give a critical overview, for the first time, of the performance of these tools. To that aim, datasets from metabolomic repositories were selected and enriched data were created. Both types of data were analysed with these tools and outputs were thoroughly examined.

Results: An exploratory multivariate analysis of the most used tools for the enrichment of metabolite sets, based on a non-metric multidimensional scaling (NMDS) of Jaccard's distances, was performed and mirrored their diversity. Codes (identifiers) of the metabolites of the datasets were searched in different metabolite databases (HMDB, KEGG, PubChem, ChEBI, BioCyc/HumanCyc, LipidMAPS, ChemSpider, METLIN and Recon2). The databases that presented more identifiers of the metabolites of the dataset were PubChem, followed by METLIN and ChEBI. However, these databases had duplicated entries and might present false positives. The performance of over-representation analysis (ORA) tools, including BioCyc/HumanCyc, ConsensusPathDB, IMPaLA, MBRole, MetaboAnalyst, Metabox, MetExplore, MPEA, PathVisio and Reactome and the mapping tool KEGGREST, was examined. Results were mostly consistent among tools and between real and enriched data despite the variability of the tools. Nevertheless, a few controversial results such as differences in the total number of metabolites were also found. Disease-based enrichment analyses were also assessed, but they were not found to be accurate probably due to the fact that metabolite disease sets are not up-to-date and the difficulty of predicting diseases from a list of metabolites.

Conclusions: We have extensively reviewed the state-of-the-art of the available range of tools for metabolomic datasets, the completeness of metabolite databases, the performance of ORA methods and disease-based analyses. Despite the variability of the tools, they provided consistent results independent of their analytic approach. However, more work on the completeness of metabolite and pathway databases is required, which strongly affects the accuracy of enrichment analyses. Improvements will be translated into more accurate and global insights of the metabolome.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5749025PMC
http://dx.doi.org/10.1186/s12859-017-2006-0DOI Listing

Publication Analysis

Top Keywords

tools enrichment
12
tools
10
bioinformatic tools
8
enriched data
8
identifiers metabolites
8
metabolite databases
8
despite variability
8
variability tools
8
enrichment analyses
8
completeness metabolite
8

Similar Publications

A new thin film was fabricated using FeO@SiO-polyoxometalate (POM) as the coating and it was coupled with a HPLC-UV to develop a method for the selective determination of ibuprofen, paracetamol and diclofenac (as the model analytes) from human plasma and urine samples. The prepared magnetic POM was coated on the pores and surface of cotton yarn to prepare the extracting device. The prepared sorbent was characterized by several techniques including: FT-IR, XRD, BET, SEM, and VSM analysis.

View Article and Find Full Text PDF

Validation and in silico function prediction of circtial1 as a novel marker of abnormal lung development in nitrofen-induced congenital diaphragmatic hernia (CDH).

Pediatr Surg Int

December 2024

Division of Pediatric Surgery, Department of Surgery, Max Rady College of Medicine, University of Manitoba, and Children's Hospital Research Institute of Manitoba, AE402-820 Sherbrook Street, Winnipeg, MB, R3A 1S1, Canada.

Purpose: Circular RNAs (circRNAs) are stable, non-coding RNAs with tissue- and developmental-specific expression making them suitable biomarkers for congenital anomalies. Current circRNA discovery pipelines have focused on human and mouse. We aim to bridge this gap by combining bioinformatics resources and used circtial1 as a model candidate in the nitrofen rat model of congenital diaphragmatic hernia (CDH).

View Article and Find Full Text PDF

Background: This commentary article critically assesses the inclusion and recognition of young adults with lived and living experiences (YALLE) in academic publishing. Stemming from our involvement in a health research study, this analysis interrogates the disparity between the stated importance of YALLE contributions in health research and their actual recognition, specifically in academic publications, which serve as the principal "currency" in research. This tokenism limits the potential for their unique insights to substantially enrich the discourse and dissemination of knowledge.

View Article and Find Full Text PDF

Complementary insights into gut viral genomes: a comparative benchmark of short- and long-read metagenomes using diverse assemblers and binners.

Microbiome

December 2024

Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular Imaging, Department of Bioinformatics and Systems Biology, Center for Artificial Intelligence Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China.

Background: Metagenome-assembled viral genomes have significantly advanced the discovery and characterization of the human gut virome. However, we lack a comparative assessment of assembly tools on the efficacy of viral genome identification, particularly across next-generation sequencing (NGS) and third-generation sequencing (TGS) data.

Results: We evaluated the efficiency of NGS, TGS, and hybrid assemblers for viral genome discovery using 95 viral-like particle (VLP)-enriched fecal samples sequenced on both Illumina and PacBio platforms.

View Article and Find Full Text PDF

Mining single-cell data for cell type-disease associations.

NAR Genom Bioinform

December 2024

Precision Health, The Kids Research Institute Australia, 15 Hospital Ave, Nedlands, 6009, WA, Australia.

A robust understanding of the cellular mechanisms underlying diseases sets the foundation for the effective design of drugs and other interventions. The wealth of existing single-cell atlases offers the opportunity to uncover high-resolution information on expression patterns across various cell types and time points. To better understand the associations between cell types and diseases, we leveraged previously developed tools to construct a standardized analysis pipeline and systematically explored associations across four single-cell datasets, spanning a range of tissue types, cell types and developmental time periods.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!