Opening the Random Forest Black Box of H NMR Metabolomics Data by the Exploitation of Surrogate Variables.

Metabolites

Institute of Food Chemistry, Hamburg School of Food Science, University of Hamburg, Grindelallee 117, 20146 Hamburg, Germany.

Published: October 2023

The untargeted metabolomics analysis of biological samples with nuclear magnetic resonance (NMR) provides highly complex data containing various signals from different molecules. To use these data for classification, e.g., in the context of food authentication, machine learning methods are used. These methods are usually applied as a black box, which means that no information about the complex relationships between the variables and the outcome is obtained. In this study, we show that the random forest-based approach surrogate minimal depth (SMD) can be applied for a comprehensive analysis of class-specific differences by selecting relevant variables and analyzing their mutual impact on the classification model of different truffle species. SMD allows the assignment of variables from the same metabolites as well as the detection of interactions between different metabolites that can be attributed to known biological relationships.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10608983PMC
http://dx.doi.org/10.3390/metabo13101075DOI Listing

Publication Analysis

Top Keywords

black box
8
opening random
4
random forest
4
forest black
4
box nmr
4
nmr metabolomics
4
metabolomics data
4
data exploitation
4
exploitation surrogate
4
variables
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!