Comprehensive collections approaching millions of sequenced genomes have become central information sources in the life sciences. However, the rapid growth of these collections has made it effectively impossible to search these data using tools such as BLAST and its successors. Here, we present a technique called phylogenetic compression, which uses evolutionary history to guide compression and efficiently search large collections of microbial genomes using existing algorithms and data structures. We show that, when applied to modern diverse collections approaching millions of genomes, lossless phylogenetic compression improves the compression ratios of assemblies, de Bruijn graphs, and -mer indexes by one to two orders of magnitude. Additionally, we develop a pipeline for a BLAST-like search over these phylogeny-compressed reference data, and demonstrate it can align genes, plasmids, or entire sequencing experiments against all sequenced bacteria until 2019 on ordinary desktop computers within a few hours. Phylogenetic compression has broad applications in computational biology and may provide a fundamental design principle for future genomics infrastructure.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10153118PMC
http://dx.doi.org/10.1101/2023.04.15.536996DOI Listing

Publication Analysis

Top Keywords

phylogenetic compression
16
microbial genomes
8
collections approaching
8
approaching millions
8
compression
6
efficient robust
4
search
4
robust search
4
search microbial
4
genomes
4

Similar Publications

Diet is one of a limited set of key ecological parameters defining primate species. A detailed understanding of dental functional correlates with primate diet is a key component for accurate dietary inference in fossil primates. Although considerable effort has been devoted to understanding post-canine dental function, incisor function remains poorly understood.

View Article and Find Full Text PDF

Morphological and molecular characterization of a Sarcocystis bovifelis-like sarcocyst in American beef.

Parasit Vectors

December 2024

United States Department of Agriculture, Agricultural Research Service, Beltsville Agricultural Research Centre, Animal Parasitic Diseases Laboratory, Beltsville, MD, 20705-2350, USA.

Background: Parasites in the apicomplexan genus Sarcocystis infect cattle worldwide. Assessing the economic importance of each such parasite species requires proper diagnosis. Sarcocystis cruzi, a thin-walled species, infects virtually all cattle.

View Article and Find Full Text PDF

Premise: Adaptive radiation in ecologically and morphologically diverse plant lineages presents an opportunity to investigate the rapid evolution of novel floral traits. While some types of floral traits, such as flower color, are well-characterized, other types of complex morphologies remain understudied. One example is occluded personate flowers, dorso-ventrally compressed flowers with obstructed floral passageways, which have evolved in multiple genera, but have only been characterized from snapdragon.

View Article and Find Full Text PDF
Article Synopsis
  • Hammerhead sharks belong to the Sphyrnidae family, which is noted for their unique head shape and consists of a diverse Miocene radiation of sharks.
  • The bonnethead shark (Sphyrna tiburo) is divided into two recognized subspecies based on geographic distribution, with genetic evidence suggesting a more complex species structure.
  • Research comparing vertebral counts and morphological characteristics has confirmed the presence of two distinct species in the Western Atlantic: S. tiburo and a newly identified species, S. alleni, distinguished by specific vertebral counts and a unique shovel-shaped head.
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!