Tools that classify sequencing reads against a database of reference sequences require efficient index data-structures. The -index is a compressed full-text index that answers substring presence/absence, count, and locate queries in space proportional to the amount of distinct sequence in the database: [Formula: see text] space, where is the number of Burrows-Wheeler runs. To date, the -index has lacked the ability to quickly classify matches according to which reference sequences (or sequence groupings, i.e., taxa) a match overlaps. We present new algorithms and methods for solving this problem. Specifically, given a collection D of documents, [Formula: see text] over an alphabet of size σ, we extend the -index with [Formula: see text] additional words to support document listing queries for a pattern [Formula: see text] that occurs in [Formula: see text] documents in D in [Formula: see text] time and [Formula: see text] space, where is the machine word size. Applied in a bacterial mock community experiment, our method is up to three times faster than a comparable method that uses the standard -index locate queries. We show that our method classifies both simulated and real nanopore reads at the strain level with higher accuracy compared with other approaches. Finally, we present strategies for compacting this structure in applications in which read lengths or match lengths can be bounded.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10538492 | PMC |
http://dx.doi.org/10.1101/gr.277642.123 | DOI Listing |
Sci Bull (Beijing)
January 2025
Beijing National Laboratory for Molecular Sciences, CAS Laboratory of Colloid and Interface and Thermodynamics, CAS Research/Education Centre for Excellence in Molecular Sciences, Centre for Carbon Neutral Chemistry, Institute of Chemistry, Chinese Academy of Sciences, Beijing 100190, China; School of Chemistry, University of Chinese Academy of Sciences, Beijing 100049, China; Shanghai Key Laboratory of Green Chemistry and Chemical Processes, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China. Electronic address:
Temperature affects both the thermodynamics of intermediate adsorption and the kinetics of elementary reactions. Despite its extensive study in thermocatalysis, temperature effect is typically overlooked in electrocatalysis. This study investigates how electrolyte temperature influences CO electroreduction over Cu catalysts.
View Article and Find Full Text PDFSci Rep
January 2025
Saint Petersburg State University, St. Petersburg, 198504, Russia.
Using angle-resolved photoemission spectroscopy (ARPES) and density functional theory (DFT), an experimental and theoretical study of changes in the electronic structure (dispersion dependencies) and corresponding modification of the energy band gap at the Dirac point (DP) for topological insulator (TI) [Formula: see text] have been carried out with gradual replacement of magnetic Mn atoms by non-magnetic Ge atoms when concentration of the latter was varied from 10% to 75%. It was shown that when Ge concentration increases, the bulk band gap decreases and reaches zero plateau in the concentration range of 45-60% while trivial surface states (TrSS) are present and exhibit an energy splitting of 100 and 70 meV in different types of measurements. It was also shown that TSS disappear from the measured band dispersions at a Ge concentration of about 40%.
View Article and Find Full Text PDFSci Rep
January 2025
Department of Mathematical Sciences, Faculty of Science, Somali National University, Mogadishu Campus, Mogadishu, Somalia.
In recent years, machine learning has gained substantial attention for its ability to predict complex chemical and biological properties, including those of pharmaceutical compounds. This study proposes a machine learning-based quantitative structure-property relationship (QSPR) model for predicting the physicochemical properties of anti-arrhythmia drugs using topological descriptors. Anti-arrhythmic drug development is challenging due to the complex relationship between chemical structure and drug efficacy.
View Article and Find Full Text PDFSci Rep
January 2025
Torrens University Australia, Fortitude Valley, QLD 4006, Leaders Institute, 76 Park Road, Woolloongabba, QLD 4102, Brisbane, Queensland, Australia.
Sci Rep
January 2025
TCM gynecology department, Foshan Fosun Chancheng Hospital, Chancheng District, Foshan, Guangdong Province, China.
Erectile Dysfunction (ED) is the leading cause of sexual dysfunction affecting hundreds of millions of men worldwide, and has been described as an important public health problem. The association of five novel anthropometrics related to obesity, lipids and glucose with ED remains unclear. To investigate the association of lipid accumulation products index (LAP), triglyceride glucose index (TyG), waist triglyceride index (WTI), weight-adjusted-waist index (WWI) and a body shape index (ABSI) with ED.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!