BlastFrost is a highly efficient method for querying 100,000s of genome assemblies, building on Bifrost, a dynamic data structure for compacted and colored de Bruijn graphs. BlastFrost queries a Bifrost data structure for sequences of interest and extracts local subgraphs, enabling the identification of the presence or absence of individual genes or single nucleotide sequence variants. We show two examples using Salmonella genomes: finding within minutes the presence of genes in the SPI-2 pathogenicity island in a collection of 926 genomes and identifying single nucleotide polymorphisms associated with fluoroquinolone resistance in three genes among 190,209 genomes. BlastFrost is available at https://github.com/nluhmann/BlastFrost/tree/master/data .

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7798312PMC
http://dx.doi.org/10.1186/s13059-020-02237-3DOI Listing

Publication Analysis

Top Keywords

querying 100000s
8
graphs blastfrost
8
data structure
8
single nucleotide
8
blastfrost
4
blastfrost fast
4
fast querying
4
100000s bacterial
4
genomes
4
bacterial genomes
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!