Seed-chain-extend with -mer seeds is a powerful heuristic technique for sequence alignment used by modern sequence aligners. Although effective in practice for both runtime and accuracy, theoretical guarantees on the resulting alignment do not exist for seed-chain-extend. In this work, we give the first rigorous bounds for the efficacy of seed-chain-extend with -mers Assume we are given a random nucleotide sequence of length ∼ that is indexed (or seeded) and a mutated substring of length ∼ ≤ with mutation rate θ < 0.206. We prove that we can find a = Θ(log ) for the -mer size such that the expected runtime of seed-chain-extend under optimal linear-gap cost chaining and quadratic time gap extension is ( log ), where (θ) < 2.43 · θ holds as a loose bound. The alignment also turns out to be good; we prove that more than [Formula: see text] fraction of the homologous bases is under an optimal chain. We also show that our bounds work when -mers are , that is, only a subset of all -mers is selected, and that sketching reduces chaining time without increasing alignment time or decreasing accuracy too much, justifying the effectiveness of sketching as a practical speedup in sequence alignment. We verify our results in simulation and on real noisy long-read data and show that our theoretical runtimes can predict real runtimes accurately. We conjecture that our bounds can be improved further, and in particular, (θ) can be further reduced.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10538486 | PMC |
http://dx.doi.org/10.1101/gr.277637.122 | DOI Listing |
Proc Natl Acad Sci U S A
January 2025
Innovative Genomics Institute, University of California, Berkeley, CA 94720.
The widespread application of genome editing to treat and cure disease requires the delivery of genome editors into the nucleus of target cells. Enveloped delivery vehicles (EDVs) are engineered virally derived particles capable of packaging and delivering CRISPR-Cas9 ribonucleoproteins (RNPs). However, the presence of lentiviral genome encapsulation and replication proteins in EDVs has obscured the underlying delivery mechanism and precluded particle optimization.
View Article and Find Full Text PDFVet Med Sci
January 2025
Department of Genetics, Faculty of Veterinary Medicine, Yozgat Bozok University, Yozgat, Türkiye.
Background: Determining the complete genome sequence data of adenoviruses has recently become greatly important due to their use by scientists as vectors in cancer studies and other fields, including vaccine development. However, the GenBank database currently has few complete genome sequences of adenoviruses, which are known for their large genomes. To address this gap, we analysed next-generation sequencing data obtained from our previous study to provide the complete genome sequence of the canine adenovirus-2 strain.
View Article and Find Full Text PDFMicrobiol Spectr
January 2025
Department of Laboratory Medicine, National University Hospital, Singapore, Singapore.
Unlabelled: The complex (MAC) is a common causative agent causing nontuberculous mycobacterial (NTM) pulmonary disease worldwide. Whole-genome sequencing was performed on a total of 203 retrospective MAC isolates from respiratory specimens. Phylogenomic analysis identified eight subspecies and species.
View Article and Find Full Text PDFAppl Environ Microbiol
January 2025
Department of Biological Sciences, Minnesota State University Mankato, Mankato, Minnesota, USA.
Unlabelled: causes bacterial cold-water disease (BCWD) in salmonids and other fish, resulting in substantial economic losses in aquaculture worldwide. The mechanisms uses to cause disease are poorly understood. Despite considerable effort, most strains of have resisted attempts at genetic manipulation.
View Article and Find Full Text PDFHLA
January 2025
Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Pirogov Medical University, Moscow, Russia.
The new HLA-C*12:02:55 allele showed one synonymous nucleotide difference compared to the HLA-С*12:02:02:01 allele in codon 134.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!