Context-aware seeds for read mapping.

Algorithms Mol Biol

Computational Biology Department, Carnegie Mellon University, Pittsburgh, 15213 USA.

Published: May 2020

Motivation: Most modern seed-and-extend NGS read mappers employ a seeding scheme that requires extracting non-overlapping seeds in each read in order to find all valid mappings under an edit distance threshold of . As grows, this seeding scheme forces mappers to use more and shorter seeds, which increases the seed hits (seed frequencies) and therefore reduces the efficiency of mappers.

Results: We propose a novel seeding framework, context-aware seeds (CAS). CAS guarantees finding all valid mappings but uses fewer (and longer) seeds, which reduces seed frequencies and increases efficiency of mappers. CAS achieves this improvement by attaching a confidence radius to each seed in the reference. We prove that all valid mappings can be found if the sum of confidence radii of seeds are greater than . CAS generalizes the existing pigeonhole-principle-based seeding scheme in which this confidence radius is implicitly always 1. Moreover, we design an efficient algorithm that constructs the confidence radius database in linear time. We experiment CAS with genome and show that CAS significantly reduces seed frequencies when compared with the state-of-the-art pigeonhole-principle-based seeding algorithm, the Optimal Seed Solver.

Availability: https://github.com/Kingsford-Group/CAS_code.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7245042PMC
http://dx.doi.org/10.1186/s13015-020-00172-3DOI Listing

Publication Analysis

Top Keywords

seeding scheme
12
valid mappings
12
confidence radius
12
context-aware seeds
8
seeds read
8
reduces seed
8
seed frequencies
8
pigeonhole-principle-based seeding
8
cas
6
seeding
5

Similar Publications

Parameter Estimation Procedures for Exponential-family Random Graph Models on Count-valued Networks: A Comparative Simulation Study.

Soc Networks

January 2024

Departments of Sociology, Statistics, Computer Science, and EECS, University of California, Irvine, CA, United States.

The exponential-family random graph models (ERGMs) have emerged as an important framework for modeling social networks for a wide variety of relational types. ERGMs for valued networks are less well-developed than their unvalued counterparts, and pose particular computational challenges. Network data with edge values on the non-negative integers (count-valued networks) is an important such case, with examples ranging from the magnitude of migration and trade flows between places to the frequency of interactions and encounters between individuals.

View Article and Find Full Text PDF

Anisotropically Epitaxial P-N Heterostructures Actuating Efficient Z-Scheme Photocatalytic Water Splitting.

Small

January 2025

Key Laboratory of Eco-chemical Engineering, International S&T Cooperation Foundation of Eco-chemical Engineering and Green Manufacture, College of Chemistry and Molecular Engineering, Qingdao University of Science and Technology, Qingdao, 266042, P. R. China.

Crafting anisotropically epitaxial p-n heterostructures with Z-scheme charge transmission is a promising avenue toward excellent photocatalytic efficiency, yet the large lattice mismatch and diverse crystal growth habits between components have often arisen as a big challenge to this goal. Here, anisotropically epitaxial p-n heterostructures with 19.8% lattice mismatch are obtained via a dynamics-mediated seeded growth tactic under reaction temperature as low as 60 °C.

View Article and Find Full Text PDF

Basis of single-seed formation in chestnut: cytomorphological observations reveal ovule developmental patterns of .

PeerJ

January 2025

Key Laboratory of Cultivation and Protection for Non-Wood Forest Trees, Ministry of Education, Central South University of Forestry and Technology, Changsha, Hunan Province, China.

Background: Many plants, including those commonly found in the Fagaceae family, produce more flowers and ovules than mature fruits and seeds. In , an ovary contains 16-24 ovules, but only one develops into a seed. The other ovules abort or otherwise fail to fully develop, but the reason for this is unknown.

View Article and Find Full Text PDF

Savory (Satureja rechingeri L.) is one of Iran's most important medicinal plants, having low irrigation needs, and thus is considered one of the most valuable plants for cultivation in arid and semi-arid regions, especially under drought conditions. The current research was carried out to develop a genetic algorithm-based artificial neural network (ΑΝΝ) model able of simulating the levels of antioxidants in savory when using soil amendments [biochar (BC) and superabsorbent (SA)] under drought.

View Article and Find Full Text PDF

Sugarcane is a major industrial crop highly susceptible to parasitic weed (Striga spp.), causing a 38% reduction in cane yield due to a longer lag phase of 20-40 days, and wider spacing. Herbicides with a longer retention and slow-release nature could allow Striga seeds to germinate and be killed before attaching to the host.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!