Efficient taxa identification using a pangenome index.

Genome Res

Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA.

Published: July 2023

Tools that classify sequencing reads against a database of reference sequences require efficient index data-structures. The -index is a compressed full-text index that answers substring presence/absence, count, and locate queries in space proportional to the amount of distinct sequence in the database: [Formula: see text] space, where is the number of Burrows-Wheeler runs. To date, the -index has lacked the ability to quickly classify matches according to which reference sequences (or sequence groupings, i.e., taxa) a match overlaps. We present new algorithms and methods for solving this problem. Specifically, given a collection D of documents, [Formula: see text] over an alphabet of size σ, we extend the -index with [Formula: see text] additional words to support document listing queries for a pattern [Formula: see text] that occurs in [Formula: see text] documents in D in [Formula: see text] time and [Formula: see text] space, where is the machine word size. Applied in a bacterial mock community experiment, our method is up to three times faster than a comparable method that uses the standard -index locate queries. We show that our method classifies both simulated and real nanopore reads at the strain level with higher accuracy compared with other approaches. Finally, we present strategies for compacting this structure in applications in which read lengths or match lengths can be bounded.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10538492PMC
http://dx.doi.org/10.1101/gr.277642.123DOI Listing

Publication Analysis

Top Keywords

[formula text]
28
reference sequences
8
locate queries
8
text] space
8
documents [formula
8
[formula
7
text]
7
efficient taxa
4
taxa identification
4
identification pangenome
4

Similar Publications

Temperature-dependent pathways in carbon dioxide electroreduction.

Sci Bull (Beijing)

January 2025

Beijing National Laboratory for Molecular Sciences, CAS Laboratory of Colloid and Interface and Thermodynamics, CAS Research/Education Centre for Excellence in Molecular Sciences, Centre for Carbon Neutral Chemistry, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, China; School of Chemistry, University of Chinese Academy of Sciences, Beijing 100049, China; Shanghai Key Laboratory of Green Chemistry and Chemical Processes, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China. Electronic address:

Temperature affects both the thermodynamics of intermediate adsorption and the kinetics of elementary reactions. Despite its extensive study in thermocatalysis, temperature effect is typically overlooked in electrocatalysis. This study investigates how electrolyte temperature influences CO electroreduction over Cu catalysts.

View Article and Find Full Text PDF

Using angle-resolved photoemission spectroscopy (ARPES) and density functional theory (DFT), an experimental and theoretical study of changes in the electronic structure (dispersion dependencies) and corresponding modification of the energy band gap at the Dirac point (DP) for topological insulator (TI) [Formula: see text] have been carried out with gradual replacement of magnetic Mn atoms by non-magnetic Ge atoms when concentration of the latter was varied from 10% to 75%. It was shown that when Ge concentration increases, the bulk band gap decreases and reaches zero plateau in the concentration range of 45-60% while trivial surface states (TrSS) are present and exhibit an energy splitting of 100 and 70 meV in different types of measurements. It was also shown that TSS disappear from the measured band dispersions at a Ge concentration of about 40%.

View Article and Find Full Text PDF

In recent years, machine learning has gained substantial attention for its ability to predict complex chemical and biological properties, including those of pharmaceutical compounds. This study proposes a machine learning-based quantitative structure-property relationship (QSPR) model for predicting the physicochemical properties of anti-arrhythmia drugs using topological descriptors. Anti-arrhythmic drug development is challenging due to the complex relationship between chemical structure and drug efficacy.

View Article and Find Full Text PDF

Secure IoT data dissemination with blockchain and transfer learning techniques.

Sci Rep

January 2025

Torrens University Australia, Fortitude Valley, QLD 4006, Leaders Institute, 76 Park Road, Woolloongabba, QLD 4102, Brisbane, Queensland, Australia.

Article Synopsis
  • Streaming IoT data is crucial for building trust in sustainable IoT solutions, but current systems often face issues with reliability, security, and transparency due to their centralized structures.
  • The research introduces TraVel, a framework that uses blockchain and transfer learning to improve the security of IoT data management, utilizing decentralized IPFS for data storage and a private Ethereum blockchain for enhanced data integrity.
  • TraVel implements self-executing smart contracts for access control and uses an adversarial domain adaptation model to filter out malicious data, ensuring only validated data is stored, with successful performance shown in simulations.
View Article and Find Full Text PDF

Erectile Dysfunction (ED) is the leading cause of sexual dysfunction affecting hundreds of millions of men worldwide, and has been described as an important public health problem. The association of five novel anthropometrics related to obesity, lipids and glucose with ED remains unclear. To investigate the association of lipid accumulation products index (LAP), triglyceride glucose index (TyG), waist triglyceride index (WTI), weight-adjusted-waist index (WWI) and a body shape index (ABSI) with ED.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!