Comprehensive collections approaching millions of sequenced genomes have become central information sources in the life sciences. However, the rapid growth of these collections has made it effectively impossible to search these data using tools such as BLAST and its successors. Here, we present a technique called phylogenetic compression, which uses evolutionary history to guide compression and efficiently search large collections of microbial genomes using existing algorithms and data structures. We show that, when applied to modern diverse collections approaching millions of genomes, lossless phylogenetic compression improves the compression ratios of assemblies, de Bruijn graphs, and -mer indexes by one to two orders of magnitude. Additionally, we develop a pipeline for a BLAST-like search over these phylogeny-compressed reference data, and demonstrate it can align genes, plasmids, or entire sequencing experiments against all sequenced bacteria until 2019 on ordinary desktop computers within a few hours. Phylogenetic compression has broad applications in computational biology and may provide a fundamental design principle for future genomics infrastructure.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10153118 | PMC |
http://dx.doi.org/10.1101/2023.04.15.536996 | DOI Listing |
Anat Rec (Hoboken)
January 2025
Department of Anatomy, Cell Biology & Physiology, Indiana University School of Medicine, Indianapolis, Indiana, USA.
Diet is one of a limited set of key ecological parameters defining primate species. A detailed understanding of dental functional correlates with primate diet is a key component for accurate dietary inference in fossil primates. Although considerable effort has been devoted to understanding post-canine dental function, incisor function remains poorly understood.
View Article and Find Full Text PDFParasit Vectors
December 2024
United States Department of Agriculture, Agricultural Research Service, Beltsville Agricultural Research Centre, Animal Parasitic Diseases Laboratory, Beltsville, MD, 20705-2350, USA.
Background: Parasites in the apicomplexan genus Sarcocystis infect cattle worldwide. Assessing the economic importance of each such parasite species requires proper diagnosis. Sarcocystis cruzi, a thin-walled species, infects virtually all cattle.
View Article and Find Full Text PDFbioRxiv
December 2024
Department of Biological Sciences, University of South Carolina, 715 Sumter St. Columbia, SC 29208 USA.
Premise: Adaptive radiation in ecologically and morphologically diverse plant lineages presents an opportunity to investigate the rapid evolution of novel floral traits. While some types of floral traits, such as flower color, are well-characterized, other types of complex morphologies remain understudied. One example is occluded personate flowers, dorso-ventrally compressed flowers with obstructed floral passageways, which have evolved in multiple genera, but have only been characterized from snapdragon.
View Article and Find Full Text PDFBased on morphological and genetic data, we describe a new species of Tropidophorus from the tropical karst landscape in southeastern Yunnan Province, China, close to the Vietnam border. Phylogenetically, the new species forms a clade with T. baviensis, T.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!