Trait biases in microbial reference genomes.

Sci Data

Department of Biology, University of Oregon, Eugene, USA.

Published: February 2023

Common culturing techniques and priorities bias our discovery towards specific traits that may not be representative of microbial diversity in nature. So far, these biases have not been systematically examined. To address this gap, here we use 116,884 publicly available metagenome-assembled genomes (MAGs, completeness ≥80%) from 203 surveys worldwide as a culture-independent sample of bacterial and archaeal diversity, and compare these MAGs to the popular RefSeq genome database, which heavily relies on cultures. We compare the distribution of 12,454 KEGG gene orthologs (used as trait proxies) in the MAGs and RefSeq genomes, while controlling for environment type (ocean, soil, lake, bioreactor, human, and other animals). Using statistical modeling, we then determine the conditional probabilities that a species is represented in RefSeq depending on its genetic repertoire. We find that the majority of examined genes are significantly biased for or against in RefSeq. Our systematic estimates of gene prevalences across bacteria and archaea in nature and gene-specific biases in reference genomes constitutes a resource for addressing these issues in the future.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9911409PMC
http://dx.doi.org/10.1038/s41597-023-01994-7DOI Listing

Publication Analysis

Top Keywords

reference genomes
8
trait biases
4
biases microbial
4
microbial reference
4
genomes
4
genomes common
4
common culturing
4
culturing techniques
4
techniques priorities
4
priorities bias
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!