Microbiome samples harvested from urban environments can be informative in predicting the geographic location of unknown samples. The idea that different cities may have geographically disparate microbial signatures can be utilized to predict the geographical location based on city-specific microbiome samples. We implemented this idea first; by utilizing standard bioinformatics procedures to pre-process the raw metagenomics samples provided by the CAMDA organizers. We trained several component classifiers and a robust ensemble classifier with data generated from taxonomy-dependent and taxonomy-free approaches. Also, we implemented class weighting and an optimal oversampling technique to overcome the class imbalance in the primary data. In each instance, we observed that the component classifiers performed differently, whereas the ensemble classifier consistently yielded optimal performance. Finally, we predicted the source cities of mystery samples provided by the organizers. Our results highlight the unreliability of restricting the classification of metagenomic samples to source origins to a single classification algorithm. By combining several component classifiers via the ensemble approach, we obtained classification results that were as good as the best-performing component classifier.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8093763PMC
http://dx.doi.org/10.3389/fgene.2021.642282DOI Listing

Publication Analysis

Top Keywords

ensemble classifier
12
component classifiers
12
microbiome samples
8
samples provided
8
samples
6
metagenomic geolocation
4
geolocation prediction
4
prediction adaptive
4
ensemble
4
adaptive ensemble
4

Similar Publications

Purpose: Radiomics-based machine learning (ML) models of amino acid positron emission tomography (PET) images have shown efficiency in glioma prediction tasks. However, their clinical impact on physician interpretation remains limited. This study investigated whether an explainable radiomics model modifies nuclear physicians' assessment of glioma aggressiveness at diagnosis.

View Article and Find Full Text PDF

Automated Classification of Cardiac Arrhythmia using Short-Duration ECG Signals and Machine Learning.

Biomed Phys Eng Express

January 2025

Electronics and Communication Engineering, Rajiv Gandhi University, Rono Hills, Doimukh, ITANAGAR, Itanagar, Arunachal Pradesh, 791112, INDIA.

Accurate detection of cardiac arrhythmias is crucial for preventing premature deaths. The current study employs a dual-stage Discrete Wavelet Transform (DWT) and a median filter to eliminate noise from ECG signals. Subsequently, ECG signals are segmented, and QRS regions are extracted for further preprocessing.

View Article and Find Full Text PDF

Multiclass Synthetic Accessibility Prediction.

J Chem Inf Model

January 2025

X-Chem Global HQ, 100 Beaver Street, Waltham, Massachusetts 02453, United States.

Evaluating synthetic accessibility of molecules is an integral component of the drug discovery process. While the application of machine learning models to predict whether small molecules are easy or hard to synthesize has gained attention recently, predetermined thresholds and data set imbalances present challenges for these binary classification approaches. In this study, we introduce a novel multiclass fold-ensembled classification approach to predict the minimum number of steps needed to synthesize a small molecule.

View Article and Find Full Text PDF

This study aimed to develop an advanced ensemble approach for automated classification of mental health disorders in social media posts. The research question was: can an ensemble of fine-tuned transformer models (XLNet, RoBERTa, and ELECTRA) with Bayesian hyperparameter optimization improve the accuracy of mental health disorder classification in social media text. Three transformer models (XLNet, RoBERTa, and ELECTRA) were fine-tuned on a dataset of social media posts labelled with 15 distinct mental health disorders.

View Article and Find Full Text PDF

This study presents a novel integration of two advanced deep learning models, U-Net and EfficientNetV2, to achieve high-precision segmentation and rapid classification of pathological images. A key innovation is the development of a new heatmap generation algorithm, which leverages meticulous image preprocessing, data enhancement strategies, ensemble learning, attention mechanisms, and deep feature fusion techniques. This algorithm not only produces highly accurate and interpretatively rich heatmaps but also significantly improves the accuracy and efficiency of pathological image analysis.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!