Memory-bound -mer selection for large and evolutionarily diverse reference libraries.

Genome Res

Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, California 92093, USA;

Published: October 2024

Using -mers to find sequence matches is increasingly used in many bioinformatic applications, including metagenomic sequence classification. The accuracy of these downstream applications relies on the density of the reference databases, which are rapidly growing. Although the increased density provides hope for improvements in accuracy, scalability is a concern. Reference -mers are kept in the memory during the query time, and saving all -mers of these ever-expanding databases is fast becoming impractical. Several strategies for subsampling have been proposed, including minimizers and finding taxon-specific -mers. However, we contend that these strategies are inadequate, especially when reference sets are taxonomically imbalanced, as are most microbial libraries. In this paper, we explore approaches for selecting a fixed-size subset of -mers present in an ultra-large data set to include in a library such that the classification of reads suffers the least. Our experiments demonstrate the limitations of existing approaches, especially for novel and poorly sampled groups. We propose a library construction algorithm called -mer RANKer (KRANK) that combines several components, including a hierarchical selection strategy with adaptive size restrictions and an equitable coverage strategy. We implement KRANK in highly optimized code and combine it with the locality-sensitive hashing classifier CONSULT-II to build a taxonomic classification and profiling method. On several benchmarks, KRANK -mer selection significantly reduces memory consumption with minimal loss in classification accuracy. We show in extensive analyses based on CAMI benchmarks that KRANK outperforms -mer-based alternatives in terms of taxonomic profiling and comes close to the best marker-based methods in terms of accuracy.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11529837PMC
http://dx.doi.org/10.1101/gr.279339.124DOI Listing

Publication Analysis

Top Keywords

-mer selection
8
classification accuracy
8
benchmarks krank
8
-mers
5
memory-bound -mer
4
selection large
4
large evolutionarily
4
evolutionarily diverse
4
reference
4
diverse reference
4

Similar Publications

Background: Staphylococcus aureus, a known contributor to non-healing wounds, releases vesicles (SAVs) that influence the delicate balance of host-pathogen interactions. Efferocytosis, a process by which macrophages clear apoptotic cells, plays a key role in successful wound healing. However, the precise impact of SAVs on wound repair and efferocytosis remains unknown.

View Article and Find Full Text PDF
Article Synopsis
  • The rise of multidrug-resistant bacteria highlights the urgent need for new antimicrobial medicines, leading to the investigation of antimicrobial peptoids as potential alternatives.
  • Thirteen peptoid analogues were synthesized with varying alkyl side chains to analyze their antibacterial properties, and only one, called Tosyl-Octyl-Peptoid (TOP), showed significant broad-spectrum bactericidal activity.
  • TOP effectively kills bacteria in both dividing and non-dividing states, demonstrating promising minimum inhibitory concentrations and a high selectivity ratio, suggesting its potential as a future therapeutic option against resistant infections.
View Article and Find Full Text PDF

Oligonucleotides are currently one of the most rapidly advancing classes of therapeutic modalities. Understanding critical quality attributes, such as the impurity profile, stability, potential metabolites, and sequence conformity, is the key to their ultimate success. To obtain the information presented above, liquid chromatography-mass spectrometry (LC-MS) is often employed.

View Article and Find Full Text PDF

In clinical mastitis of dairy cows, the abnormal accumulation of apoptotic cells (ACs) and subsequent secondary necrosis and inflammation pose significant concerns, with macrophage-mediated efferocytosis, crucial for ACs clearance, remaining unexplored in this context. In nonruminants, MER proto-oncogene tyrosine kinase (MERTK) receptors are essential for efferocytosis and A Disintegrin and Metalloproteinase 17 (ADAM17) is thought to play a role in regulating MERTK integrity. This study aimed to delineate the in situ role of efferocytosis in clinical mastitis, with a particular focus on the interaction between MERTK and ADAM17 in bovine macrophages.

View Article and Find Full Text PDF

Sulfation plays a critical role in the biosynthesis of small molecules, regulatory mechanisms such as hormone signaling, and detoxification processes (phase II enzymes). The sulfation reaction is catalyzed by a broad family of enzymes known as sulfotransferases (SULTs), which have been extensively studied in animals due to their medical importance, but also in plant key processes. Despite the identification of some sulfated metabolites in fungi, the mechanisms underlying fungal sulfation remain largely unknown.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!