Background: Numerous pieces of clinical evidence have shown that many phenotypic traits of human disease are related to their gut microbiome, i.e., inflammation, obesity, HIV, and diabetes. Through supervised classification, it is feasible to determine the human disease states by revealing the intestinal microbiota compositional information. However, the abundance matrix of microbiome data is so sparse, an interpretable deep model is crucial to further represent and mine the data for expansion, such as the deep forest model. What's more, overfitting can still exist in the original deep forest model when dealing with such "large p, small n" biology data. Feature reduction is considered to improve the ensemble forest model especially towards the disease identification in the human microbiota.

Methods: In this work, we propose the kernel principal components based cascade forest method, so-called KPCCF, to classify the disease states of patients by using taxonomic profiles of the microbiome at the family level. In detail, the kernel principal components analysis method is first used to reduce the original dimension of human microbiota datasets. Besides, the processed data is fed into the cascade forest to preliminarily discriminate against the disease state of the samples.

Results: The proposed KPCCF algorithm can represent the small-scale and high-dimension human microbiota datasets with the sparse feature matrix. Systematic comparison experiments demonstrate that our method consistently outperforms the state-of-the-art methods with the comparative study on 4 datasets.

Conclusion: Despite sharing some common characteristics, a one-size-fits-all solution does not exist in any space. The traditional depth model has limitations in the biological application of the unbalanced scale between small samples and high dimensions. KPCCF distinguishes from the standard deep forest model for its excellent performance in the microbiota field. Additionally, compared to other dimensionality reduction methods, the kernel principal components analysis method is more suitable for microbiota datasets.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8697468PMC
http://dx.doi.org/10.1186/s12911-021-01705-5DOI Listing

Publication Analysis

Top Keywords

kernel principal
16
principal components
16
forest model
16
cascade forest
12
human microbiota
12
deep forest
12
microbiota datasets
12
components based
8
based cascade
8
disease identification
8

Similar Publications

The rapid growth of Internet of Things (IoT) devices necessitates efficient data compression techniques to manage the vast amounts of data they generate. Chemiresistive sensor arrays (CSAs), a simple yet essential component in IoT systems, produce large datasets due to their simultaneous multi-sensor operations. Classical principal component analysis (cPCA), a widely used solution for dimensionality reduction, often struggles to preserve critical information in complex datasets.

View Article and Find Full Text PDF

Associations of urinary caffeine metabolites with sex hormones: comparison of three statistical models.

Front Nutr

January 2025

Department of Epidemiology and Health Statistics, School of Public Health, Guilin Medical University, Guilin, China.

Aims: The association between urinary caffeine and caffeine metabolites with sex hormones remains unclear. This study used three statistical models to explore the associations between urinary caffeine and its metabolites and sex hormones among adults.

Methods: We selected the participants aged ≥18 years in the National Health and Nutrition Examination Survey (NHANES) data 2013-2014 as our study subjects.

View Article and Find Full Text PDF

Effects of walnut kernel pellicle on the composition and properties of enzymatic hydrolysates of walnut meal by peptidomics and bioinformatics.

J Food Sci

January 2025

Key Laboratory of Synthetic and Natural Functional Molecule of Ministry of Education, College of Chemistry and Materials Science, National Demonstration Center for Experimental Chemistry Education, Northwest University, Xi'an, China.

The purpose of this article is to investigate the effects of walnut (Juglans regia L.) kernel pellicle on the composition and properties of enzymatic hydrolysis products of walnut meal using peptidomics and bioinformatics. In this study, a total of 3423 peptide sequences were identified in peeled walnut protein hydrolysates (PWPH) and unpeeled walnut protein hydrolysates (UWPH).

View Article and Find Full Text PDF

Agronomic characteristics, mineral nutrient content, antioxidant capacity, biochemical composition, and fatty acid profile of Iranian pistachio (Pistacia vera L.) cultivars.

BMC Plant Biol

January 2025

Republic of Türkiye, Ministry of Agriculture and Forestry, Hatay Olive Research Institute Directorate, General Directorate of Agricultural Research and Policies, Hassa Station, Hassa, Hatay, 31700, Türkiye.

Background: Pistachio (Pistacia vera L.) nuts are among the most popular nuts. The pistachio cultivars are tolerant to both drought and salinity, which is why they are extensively grown in the arid, saline, and hot regions of the Middle East, Mediterranean countries, and the United States.

View Article and Find Full Text PDF

Identifying and quantifying the dominant factors influencing heavy metal (HM) pollution sources are essential for maintaining soil ecological health and implementing effective pollution control measures. This study analyzed soil HM samples from 53 different land use types in Jiaozuo City, Henan Province, China. Pollution sources were identified using Absolute Principal Component Score (APCS), with 8 anthropogenic factors, 9 natural factors, and 4 soil physicochemical properties mapped using Geographic Information System (GIS) kernel density estimation.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!