The increasing throughput of DNA sequencing technologies creates a need for faster algorithms. The fate of most reads is to be mapped to a reference sequence, typically a genome. Modern mappers rely on heuristics to gain speed at a reasonable cost for accuracy. In the seeding heuristic, short matches between the reads and the genome are used to narrow the search to a set of candidate locations. Several seeding variants used in modern mappers show good empirical performance but they are difficult to calibrate or to optimize for lack of theoretical results. Here we develop a theory to estimate the probability that the correct location of a read is filtered out during seeding, resulting in mapping errors. We describe the properties of simple exact seeds, skip seeds and MEM seeds (Maximal Exact Match seeds). The main innovation of this work is to use concepts from analytic combinatorics to represent reads as abstract sequences, and to specify their generative function to estimate the probabilities of interest. We provide several algorithms, which together give a workable solution for the problem of calibrating seeding heuristics for short reads. We also provide a C implementation of these algorithms in a library called Sesame. These results can improve current mapping algorithms and lay the foundation of a general strategy to tackle sequence alignment problems. The Sesame library is open source and available for download at https://github.com/gui11aume/sesame.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7331467PMC
http://dx.doi.org/10.3389/fgene.2020.00572DOI Listing

Publication Analysis

Top Keywords

short reads
8
modern mappers
8
reads
5
calibrating seed-based
4
seed-based heuristics
4
heuristics map
4
map short
4
reads sesame
4
sesame increasing
4
increasing throughput
4

Similar Publications

Background: Low oxygen delivery (DO2) on cardiopulmonary bypass has been associated with acute kidney injury. We sought to determine the association of intraoperative DO2, postoperative length of stay, and major postoperative events.

Methods: DO2 values were calculated in 845 patients after initiation, and every 30 minutes on bypass.

View Article and Find Full Text PDF

Quality Analysis of Online Resources for Patients Undergoing Coronary Artery Bypass Grafting.

Ann Thorac Surg Short Rep

September 2024

Division of Cardiothoracic Surgery, Michael E. DeBakey Department of Surgery, Baylor College of Medicine, Houston, Texas.

Background: Online resources are becoming the primary educational resource for patients. Quality and reliability of websites about coronary artery bypass graft (CABG) procedures are unknown.

Methods: We queried 4 search engines (Google, Bing, Yahoo!, and Dogpile) for the terms , , , and .

View Article and Find Full Text PDF

Objectives: It is uncertain what the effects of introducing digital breast tomosynthesis (DBT) in the Dutch breast cancer screening programme would be on detection, recall, and interval cancers (ICs), while reading times are expected to increase. Therefore, an investigation into the efficiency and cost-effectiveness of DBT screening while optimising reading is required.

Materials And Methods: The Screening Tomosynthesis trial with advanced REAding Methods (STREAM) aims to include 17,275 women (age 50-72 years) eligible for breast cancer screening in the Netherlands for two biennial DBT screening rounds to determine the short-, medium-, and long-term effects and acceptability of DBT screening and identify an optimised strategy for reading DBT.

View Article and Find Full Text PDF

The time needed for the evolution of mating cues that distinguish species, such as species-specific songs or plumage coloration in birds, has received little attention. Aiming to gain some understanding of the timing of the evolutionary process we here present models of how mating cues evolve in populations split into subpopulations between which there may (parapatry) or may not (allopatry) be migration. Mating cues can be either neutral or directly selected.

View Article and Find Full Text PDF

Background: Emerging research suggests that complementary and supportive care programs, such as music therapy, show positive short-term impacts (e.g., purposeful engagement, positive emotions) on persons with dementia who live in care facilities.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!