Benchmark of Data Processing Methods and Machine Learning Models for Gut Microbiome-Based Diagnosis of Inflammatory Bowel Disease.

Ryszard Kubinski Jean-Yves Djamen-Kepaou Timur Zhanabaev Alex Hernandez-Garcia Stefan Bauer Falk Hildebrand Tamas Korcsmaros Sani Karam Prévost Jantchou Kamran Kafi Ryan D Martin

Front Genet

Phyla Technologies Inc, Montréal, QC, Canada.

Published: February 2022

Patients with inflammatory bowel disease (IBD) wait months and undergo numerous invasive procedures between the initial appearance of symptoms and receiving a diagnosis. In order to reduce time until diagnosis and improve patient wellbeing, machine learning algorithms capable of diagnosing IBD from the gut microbiome's composition are currently being explored. To date, these models have had limited clinical application due to decreased performance when applied to a new cohort of patient samples. Various methods have been developed to analyze microbiome data which may improve the generalizability of machine learning IBD diagnostic tests. With an abundance of methods, there is a need to benchmark the performance and generalizability of various machine learning pipelines (from data processing to training a machine learning model) for microbiome-based IBD diagnostic tools. We collected fifteen 16S rRNA microbiome datasets (7,707 samples) from North America to benchmark combinations of gut microbiome features, data normalization and transformation methods, batch effect correction methods, and machine learning models. Pipeline generalizability to new cohorts of patients was evaluated with two binary classification metrics following leave-one-dataset-out cross (LODO) validation, where all samples from one study were left out of the training set and tested upon. We demonstrate that taxonomic features processed with a compositional transformation method and batch effect correction with the naive zero-centering method attain the best classification performance. In addition, machine learning models that identify non-linear decision boundaries between labels are more generalizable than those that are linearly constrained. Lastly, we illustrate the importance of generating a curated training dataset to ensure similar performance across patient demographics. These findings will help improve the generalizability of machine learning models as we move towards non-invasive diagnostic and disease management tools for patients with IBD.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8895431	PMC
http://dx.doi.org/10.3389/fgene.2022.784397	DOI Listing

Publication Analysis

Top Keywords

machine learning

learning models

generalizability machine

data processing

machine

methods machine

learning

inflammatory bowel

bowel disease

improve generalizability

Similar Publications

Construction of a prognostic signature based on T-helper 17 cells differentiation-related genes for predicting survival and tumor microenvironment in head and neck squamous cell carcinoma.

Medicine (Baltimore)

January 2025

Department of Otolaryngology, Hangzhou Red Cross Hospital (Zhejiang Hospital of Integrated Traditional Chinese and Western Medicine), Hangzhou, Zhejiang, China.

Shiqin Chen Pingcun Wei Gang Wang Fan Wu Jianjun Zou

T-helper 17 (Th17) cells significantly influence the onset and advancement of malignancies. This study endeavor focused on delineating molecular classifications and developing a prognostic signature grounded in Th17 cell differentiation-related genes (TCDRGs) using machine learning algorithms in head and neck squamous cell carcinoma (HNSCC). A consensus clustering approach was applied to The Cancer Genome Atlas-HNSCC cohort based on TCDRGs, followed by an examination of differential gene expression using the limma package.

View Article and Find Full Text PDF

Similar Publications

Ultrasensitive Detection of Circulating Plasma Cells Using Surface-Enhanced Raman Spectroscopy and Machine Learning for Multiple Myeloma Monitoring.

Anal Chem

January 2025

Key Laboratory of OptoElectronic Science and Technology for Medicine of Ministry of Education, Fujian Provincial Key Laboratory of Photonics Technology, Fujian Normal University, Fuzhou, Fujian 350117, China.

Dechun Zhang Xianling Chen Jia Lin Shiyan Jiang Min Fan

Multiple myeloma is a hematologic malignancy characterized by the proliferation of abnormal plasma cells in the bone marrow. Despite therapeutic advancements, there remains a critical need for reliable, noninvasive methods to monitor multiple myeloma. Circulating plasma cells (CPCs) in peripheral blood are robust and independent prognostic markers, but their detection is challenging due to their low abundance.

View Article and Find Full Text PDF

Similar Publications

AI-Driven Innovations for Early Sepsis Detection by Combining Predictive Accuracy With Blood Count Analysis in an Emergency Setting: Retrospective Study.

J Med Internet Res

January 2025

Division of Clinical Pathology, Department of Pathology, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan.

Tai-Han Lin Hsing-Yi Chung Ming-Jr Jian Chih-Kai Chang Hung-Hsin Lin

Background: Sepsis, a critical global health challenge, accounted for approximately 20% of worldwide deaths in 2017. Although the Sequential Organ Failure Assessment (SOFA) score standardizes the diagnosis of organ dysfunction, early sepsis detection remains challenging due to its insidious symptoms. Current diagnostic methods, including clinical assessments and laboratory tests, frequently lack the speed and specificity needed for timely intervention, particularly in vulnerable populations such as older adults, intensive care unit (ICU) patients, and those with compromised immune systems.

View Article and Find Full Text PDF

Similar Publications

Delta-Radiomics Using Machine Learning Classifiers With Auxiliary Data Sets to Predict Disease Progression During Magnetic Resonance-Guided Radiotherapy in Adrenal Metastases.

JCO Clin Cancer Inform

January 2025

Machine Learning Department, H. Lee Moffit Cancer Center and Research Institute, Tampa, FL.

Jesutofunmi A Fajemisin John M Bryant Payman G Saghand Matthew N Mills Kujtim Latifi

Purpose: Adaptive radiotherapy accounts for interfractional anatomic changes. We hypothesize that changes in the gross tumor volumes identified during daily scans could be analyzed using delta-radiomics to predict disease progression events. We evaluated whether an auxiliary data set could improve prediction performance.

View Article and Find Full Text PDF

Similar Publications

Machine Learning to Predict Mortality in Older Patients With Cancer: Development and External Validation of the Geriatric Cancer Scoring System Using Two Large French Cohorts.

J Clin Oncol

January 2025

INSERM, IMRBU955, Univ Paris Est Créteil, Créteil, France.

Etienne Audureau Pierre Soubeyran Claudia Martinez-Tapia Carine Bellera Sylvie Bastuji-Garin

Purpose: Establishing an accurate prognosis remains challenging in older patients with cancer because of the population's heterogeneity and the current predictive models' reduced ability to capture the complex interactions between oncologic and geriatric predictors. We aim to develop and externally validate a new predictive score (the Geriatric Cancer Scoring System [GCSS]) to refine individualized prognosis for older patients with cancer during the first year after a geriatric assessment (GA).

Materials And Methods: Data were collected from two French prospective multicenter cohorts of patients with cancer 70 years and older, referred for GA: ELCAPA (training set January 2007-March 2016) and ONCODAGE (validation set August 2008-March 2010).

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!