A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes.

BMC Genomics

Laboratory of Hepatobiliary and Pancreatic Surgery, The Affiliated Hospital of Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China.

Published: February 2020

Background: Whole-genome approaches are widely preferred for species delineation in prokaryotes. However, these methods require pairwise alignments and calculations at the whole-genome level and thus are computationally intensive. To address this problem, a strategy consisting of sieving (pre-selecting closely related genomes) followed by alignment and calculation has been proposed.

Results: Here, we initially test a published approach called "genome-wide tetranucleotide frequency correlation coefficient" (TETRA), which is specially tailored for sieving. Our results show that sieving by TETRA requires > 40% completeness for both genomes of a pair to yield > 95% sensitivity, indicating that TETRA is completeness-dependent. Accordingly, we develop a novel algorithm called "fragment tetranucleotide frequency correlation coefficient" (FRAGTE), which uses fragments rather than whole genomes for sieving. Our results show that FRAGTE achieves ~ 100% sensitivity and high specificity on simulated genomes, real genomes and metagenome-assembled genomes, demonstrating that FRAGTE is completeness-independent. Additionally, FRAGTE sieved a reduced number of total genomes for subsequent alignment and calculation to greatly improve computational efficiency for the process after sieving. Aside from this computational improvement, FRAGTE also reduces the computational cost for the sieving process. Consequently, FRAGTE extremely improves run efficiency for both the processes of sieving and after sieving (subsequent alignment and calculation) to together accelerate genome-wide species delineation.

Conclusions: FRAGTE is a completeness-independent algorithm for sieving. Due to its high sensitivity, high specificity, highly reduced number of sieved genomes and highly improved runtime, FRAGTE will be helpful for whole-genome approaches to facilitate taxonomic studies in prokaryotes.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7045542PMC
http://dx.doi.org/10.1186/s12864-020-6597-xDOI Listing

Publication Analysis

Top Keywords

alignment calculation
12
genomes
9
sieving
9
closely genomes
8
species delineation
8
delineation prokaryotes
8
whole-genome approaches
8
tetranucleotide frequency
8
frequency correlation
8
correlation coefficient"
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!