Motivation: Most modern seed-and-extend NGS read mappers employ a seeding scheme that requires extracting non-overlapping seeds in each read in order to find all valid mappings under an edit distance threshold of . As grows, this seeding scheme forces mappers to use more and shorter seeds, which increases the seed hits (seed frequencies) and therefore reduces the efficiency of mappers.
Results: We propose a novel seeding framework, context-aware seeds (CAS). CAS guarantees finding all valid mappings but uses fewer (and longer) seeds, which reduces seed frequencies and increases efficiency of mappers. CAS achieves this improvement by attaching a confidence radius to each seed in the reference. We prove that all valid mappings can be found if the sum of confidence radii of seeds are greater than . CAS generalizes the existing pigeonhole-principle-based seeding scheme in which this confidence radius is implicitly always 1. Moreover, we design an efficient algorithm that constructs the confidence radius database in linear time. We experiment CAS with genome and show that CAS significantly reduces seed frequencies when compared with the state-of-the-art pigeonhole-principle-based seeding algorithm, the Optimal Seed Solver.
Availability: https://github.com/Kingsford-Group/CAS_code.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7245042 | PMC |
http://dx.doi.org/10.1186/s13015-020-00172-3 | DOI Listing |
Soc Networks
January 2024
Departments of Sociology, Statistics, Computer Science, and EECS, University of California, Irvine, CA, United States.
The exponential-family random graph models (ERGMs) have emerged as an important framework for modeling social networks for a wide variety of relational types. ERGMs for valued networks are less well-developed than their unvalued counterparts, and pose particular computational challenges. Network data with edge values on the non-negative integers (count-valued networks) is an important such case, with examples ranging from the magnitude of migration and trade flows between places to the frequency of interactions and encounters between individuals.
View Article and Find Full Text PDFSmall
January 2025
Key Laboratory of Eco-chemical Engineering, International S&T Cooperation Foundation of Eco-chemical Engineering and Green Manufacture, College of Chemistry and Molecular Engineering, Qingdao University of Science and Technology, Qingdao, 266042, P. R. China.
Crafting anisotropically epitaxial p-n heterostructures with Z-scheme charge transmission is a promising avenue toward excellent photocatalytic efficiency, yet the large lattice mismatch and diverse crystal growth habits between components have often arisen as a big challenge to this goal. Here, anisotropically epitaxial p-n heterostructures with 19.8% lattice mismatch are obtained via a dynamics-mediated seeded growth tactic under reaction temperature as low as 60 °C.
View Article and Find Full Text PDFPeerJ
January 2025
Key Laboratory of Cultivation and Protection for Non-Wood Forest Trees, Ministry of Education, Central South University of Forestry and Technology, Changsha, Hunan Province, China.
Background: Many plants, including those commonly found in the Fagaceae family, produce more flowers and ovules than mature fruits and seeds. In , an ovary contains 16-24 ovules, but only one develops into a seed. The other ovules abort or otherwise fail to fully develop, but the reason for this is unknown.
View Article and Find Full Text PDFBMC Plant Biol
January 2025
Department of Agricultural Science, Biotechnology and Food Science, Cyprus University of Technology, Limassol, 3036, Cyprus.
Savory (Satureja rechingeri L.) is one of Iran's most important medicinal plants, having low irrigation needs, and thus is considered one of the most valuable plants for cultivation in arid and semi-arid regions, especially under drought conditions. The current research was carried out to develop a genetic algorithm-based artificial neural network (ΑΝΝ) model able of simulating the levels of antioxidants in savory when using soil amendments [biochar (BC) and superabsorbent (SA)] under drought.
View Article and Find Full Text PDFSci Rep
December 2024
ICAR-Indian Grassland and Fodder Research Institute, Jhansi, 284 003, India.
Sugarcane is a major industrial crop highly susceptible to parasitic weed (Striga spp.), causing a 38% reduction in cane yield due to a longer lag phase of 20-40 days, and wider spacing. Herbicides with a longer retention and slow-release nature could allow Striga seeds to germinate and be killed before attaching to the host.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!