Small open reading frames (smORFs) shorter than 100 codons are widespread and perform essential roles in microorganisms, where they encode proteins active in several cell functions, including signal pathways, stress response, and antibacterial activities. However, the ecology, distribution and role of small proteins in the global microbiome remain unknown. Here, we construct a global microbial smORFs catalog (GMSC) derived from 63,410 publicly available metagenomes across 75 distinct habitats and 87,920 high-quality isolate genomes.
View Article and Find Full Text PDFNovel antibiotics are urgently needed to combat the antibiotic-resistance crisis. We present a machine-learning-based approach to predict antimicrobial peptides (AMPs) within the global microbiome and leverage a vast dataset of 63,410 metagenomes and 87,920 prokaryotic genomes from environmental and host-associated habitats to create the AMPSphere, a comprehensive catalog comprising 863,498 non-redundant peptides, few of which match existing databases. AMPSphere provides insights into the evolutionary origins of peptides, including by duplication or gene truncation of longer sequences, and we observed that AMP production varies by habitat.
View Article and Find Full Text PDFMeta'omic data on microbial diversity and function accrue exponentially in public repositories, but derived information is often siloed according to data type, study or sampled microbial environment. Here we present SPIRE, a Searchable Planetary-scale mIcrobiome REsource that integrates various consistently processed metagenome-derived microbial data modalities across habitats, geography and phylogeny. SPIRE encompasses 99 146 metagenomic samples from 739 studies covering a wide array of microbial environments and augmented with manually-curated contextual data.
View Article and Find Full Text PDFNovel antibiotics are urgently needed to combat the antibiotic-resistance crisis. We present a machine learning-based approach to predict prokaryotic antimicrobial peptides (AMPs) by leveraging a vast dataset of 63,410 metagenomes and 87,920 microbial genomes. This led to the creation of AMPSphere, a comprehensive catalog comprising 863,498 non-redundant peptides, the majority of which were previously unknown.
View Article and Find Full Text PDFBuffalo is an important livestock species. Here, we present a comprehensive metagenomic survey of the microbial communities along the buffalo digestive tract. We analysed 695 samples covering eight different sites in three compartments (four-chambered stomach, intestine, and rectum).
View Article and Find Full Text PDFmBodyMap is a curated database for microbes across the human body and their associations with health and diseases. Its primary aim is to promote the reusability of human-associated metagenomic data and assist with the identification of disease-associated microbes by consistently annotating the microbial contents of collected samples using state-of-the-art toolsets and manually curating the meta-data of corresponding human hosts. mBodyMap organizes collected samples based on their association with human diseases and body sites to enable cross-dataset integration and comparison.
View Article and Find Full Text PDF