BlastFrost is a highly efficient method for querying 100,000s of genome assemblies, building on Bifrost, a dynamic data structure for compacted and colored de Bruijn graphs. BlastFrost queries a Bifrost data structure for sequences of interest and extracts local subgraphs, enabling the identification of the presence or absence of individual genes or single nucleotide sequence variants. We show two examples using Salmonella genomes: finding within minutes the presence of genes in the SPI-2 pathogenicity island in a collection of 926 genomes and identifying single nucleotide polymorphisms associated with fluoroquinolone resistance in three genes among 190,209 genomes. BlastFrost is available at https://github.com/nluhmann/BlastFrost/tree/master/data .
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7798312 | PMC |
http://dx.doi.org/10.1186/s13059-020-02237-3 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!