AI Article Synopsis

  • Metagenomics is a powerful tool for examining microbial communities, but most research currently focuses on bacteria, neglecting eukaryotic microbes.
  • A new classifier called Whokaryote was developed to effectively distinguish between eukaryotic and prokaryotic contigs by analyzing gene structure, achieving high accuracy rates (94% recall, 96% precision).
  • The enhanced version, Whokaryote+Tiara, combines features from Whokaryote with predictions from another classifier and significantly improves performance, leading to better identification of important gene clusters in microbial community studies.

Article Abstract

Metagenomics has become a prominent technology to study the functional potential of all organisms in a microbial community. Most studies focus on the bacterial content of these communities, while ignoring eukaryotic microbes. Indeed, many metagenomics analysis pipelines silently assume that all contigs in a metagenome are prokaryotic, likely resulting in less accurate annotation of eukaryotes in metagenomes. Early detection of eukaryotic contigs allows for eukaryote-specific gene prediction and functional annotation. Here, we developed a classifier that distinguishes eukaryotic from prokaryotic contigs based on foundational differences between these taxa in terms of gene structure. We first developed Whokaryote, a random forest classifier that uses intergenic distance, gene density and gene length as the most important features. We show that, with an estimated recall, precision and accuracy of 94, 96 and 95 %, respectively, this classifier with features grounded in biology can perform almost as well as the classifiers EukRep and Tiara, which use k-mer frequencies as features. By retraining our classifier with Tiara predictions as an additional feature, the weaknesses of both types of classifiers are compensated; the result is Whokaryote+Tiara, an enhanced classifier that outperforms all individual classifiers, with an F1 score of 0.99 for both eukaryotes and prokaryotes, while still being fast. In a reanalysis of metagenome data from a disease-suppressive plant endospheric microbial community, we show how using Whokaryote+Tiara to select contigs for eukaryotic gene prediction facilitates the discovery of several biosynthetic gene clusters that were missed in the original study. Whokaryote (+Tiara) is wrapped in an easily installable package and is freely available from https://github.com/LottePronk/whokaryote.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9465069PMC
http://dx.doi.org/10.1099/mgen.0.000823DOI Listing

Publication Analysis

Top Keywords

eukaryotic prokaryotic
8
prokaryotic contigs
8
gene structure
8
microbial community
8
gene prediction
8
gene
7
eukaryotic
5
contigs
5
classifier
5
whokaryote distinguishing
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!