As the availability of high-throughput metagenomic data is increasing, agile and accurate tools are required to analyze and exploit this valuable and plentiful resource. Cellulose-degrading enzymes have various applications, and finding appropriate cellulases for different purposes is becoming increasingly challenging. An screening method for high-throughput data can be of great assistance when combined with the characterization of thermal and pH dependence. By this means, various metagenomic sources with high cellulolytic potentials can be explored. Using a sequence similarity-based annotation and an ensemble of supervised learning algorithms, this study aims to identify and characterize cellulolytic enzymes from a given high-throughput metagenomic data based on optimum temperature and pH. The prediction performance of MCIC (metagenome cellulase identification and characterization) was evaluated through multiple iterations of sixfold cross-validation tests. This tool was also implemented for a comparative analysis of four metagenomic sources to estimate their cellulolytic profile and capabilities. For experimental validation of MCIC's screening and prediction abilities, two identified enzymes from cattle rumen were subjected to cloning, expression, and characterization. To the best of our knowledge, this is the first time that a sequence-similarity based method is used alongside an ensemble machine learning model to identify and characterize cellulase enzymes from extensive metagenomic data. This study highlights the strength of machine learning techniques to predict enzymatic properties solely based on their sequence. MCIC is freely available as a python package and standalone toolkit for Windows and Linux-based operating systems with several functions to facilitate the screening and thermal and pH dependence prediction of cellulases.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7645119 | PMC |
http://dx.doi.org/10.3389/fmicb.2020.567863 | DOI Listing |
Data Brief
February 2025
Department of Biology, Allama Iqbal Open University, Islamabad, Pakistan.
Plants are colonized by a vast array of microorganisms that outstrip plant cell densities and genes, thus referred to as plant's second genome or extended genome. The microbial communities exert a significant influence on the vigor, growth, development and productivity of plants by supporting nutrient acquisition, organic matter decomposition and tolerance against biotic and abiotic stresses such as heat, high salt, drought and disease, by regulating plant defense responses. The rhizosphere is a complex micro-ecological zone in the direct vicinity of plant roots and is considered a hotspot of microbial diversity.
View Article and Find Full Text PDFAnn Vasc Dis
January 2025
Division of Cardiovascular Medicine, Kobe University Graduate School of Medicine, Kobe, Hyogo, Japan.
The pathophysiological mechanism of abdominal aortic aneurysm (AAA) remains unclear. We previously reported that levels were reduced in the feces of patients with AAA by 16S ribosomal ribonucleic acid (RNA) gene sequencing. In this study, we increased the number of cases and conducted metagenomic analyses to examine bacterial genes associated with the pathophysiology of AAA.
View Article and Find Full Text PDFMicrolife
January 2025
Environmental Metagenomics, Research Center One Health Ruhr of the University Alliance Ruhr, Faculty of Chemistry, University of Duisburg-Essen, 45141 Essen, Germany.
Oil reservoirs are society's primary source of hydrocarbons. While microbial communities in industrially exploited oil reservoirs have been investigated in the past, pristine microbial communities in untapped oil reservoirs are little explored, as are distribution patterns of respective genetic signatures. Here, we show that a pristine oil sample contains a complex community consisting of bacteria and fungi for the degradation of hydrocarbons.
View Article and Find Full Text PDFBiochem Mol Biol Educ
January 2025
Research Group of Environmental Metagenomics, Leiden Centre for Applied Bioscience, Leiden University of Applied Sciences, Leiden, Netherlands.
Targeted metagenomics is a rapidly expanding technology to analyze complex biological samples and genetic monitoring of environmental samples. In this research field, data analytical aspects play a crucial role. In order to teach targeted metagenomics data analysis, we developed a 4-week inquiry-driven modular course-based undergraduate research experience (mCURE) using publicly available Australian coral microbiome DNA sequencing data and associated metadata.
View Article and Find Full Text PDFBiol Lett
January 2025
Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA.
Bacterial strains that inhabit the gastrointestinal tracts of hominids have diversified in parallel (co-diversified) with their host species. The extent to which co-diversification has been mediated by partner fidelity between strains and hosts or by geographical distance between hosts is not clear due to a lack of strain-level data from clades of hosts with unconfounded phylogenetic relationships and geographical distributions. Here, I tested these competing hypotheses through meta-analyses of 7121 gut bacterial genomes assembled from wild-living ape species and subspecies sampled throughout their ranges in equatorial Africa.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!