Nitrogen fixation, a distinct process incorporating the inactive atmospheric nitrogen into the active biological processes, has been a major topic in biological and geochemical studies. Currently, insights into diversity and distribution of nitrogen-fixing microbes are dependent upon homology-based analyses of nitrogenase genes, especially the gene, which are broadly conserved in nitrogen-fixing microbes. Here, we report the pitfall of using as a marker of microbial nitrogen fixation. We exhaustively analyzed genomes in RefSeq (231,908 genomes) and KEGG (6,509 genomes) and cooccurrence and gene order patterns of nitrogenase genes (including ) therein. Up to 20% of -harboring genomes lacked and , which encode essential subunits of nitrogenase, within 10 coding sequences upstream or downstream of or on the same genome. According to a phenotypic database of prokaryotes, no species and strains harboring only possess nitrogen-fixing activities, which shows that these genes are "pseudo"- genes. Pseudo- sequences mainly belong to anaerobic microbes, including members of the class and methanogens. We also detected many pseudo- reads from metagenomic sequences of anaerobic environments such as animal guts, wastewater, paddy soils, and sediments. In some samples, pseudo- overwhelmed the number of "true" reads by 50% or 10 times. Because of the high sequence similarity between pseudo- and true-, pronounced amounts of -like reads were not confidently classified. Overall, our results encourage reconsideration of the conventional use of for detecting nitrogen-fixing microbes, while suggesting that or would be a more reliable marker. Nitrogen-fixing microbes affect biogeochemical cycling, agricultural productivity, and microbial ecosystems, and their distributions have been investigated intensively using genomic and metagenomic sequencing. Currently, insights into nitrogen fixers in the environment have been acquired by homology searches against nitrogenase genes, particularly the gene, in public databases. Here, we report that public databases include a significant amount of incorrectly annotated sequences (pseudo-). We exhaustively investigated the genomic structures of -harboring genomes and found hundreds of pseudo- sequences in RefSeq and KEGG. Over half of these pseudo- sequences belonged to members of the class , which is supposed to be a prominent nitrogen-fixing clade. We also found that the abundance of nitrogen fixers in metagenomes could be overestimated by 1.5 to >10 times due to pseudo- recorded in public databases. Our results encourage reconsideration of the prevalent use of as a marker of nitrogen-fixing microbes.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8597730 | PMC |
http://dx.doi.org/10.1128/msphere.00785-21 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!