-mer-based methods are widely used in bioinformatics, but there are many gaps in our understanding of their statistical properties. Here, we consider the simple model where a sequence (e.g., a genome or a read) undergoes a simple mutation process through which each nucleotide is mutated independently with some probability , under the assumption that there are no spurious -mer matches. How does this process affect the -mers of ? We derive the expectation and variance of the number of mutated -mers and of the number of islands (a maximal interval of mutated -mers) and oceans (a maximal interval of nonmutated -mers). We then derive hypothesis tests and confidence intervals (CIs) for given an observed number of mutated -mers, or, alternatively, given the Jaccard similarity (with or without MinHash). We demonstrate the usefulness of our results using a few select applications: obtaining a CI to supplement the Mash distance point estimate, filtering out reads during alignment by Minimap2, and rating long-read alignments to a de Bruijn graph by Jabba.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1089/cmb.2021.0431 | DOI Listing |
Genome Biol
January 2025
Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.
Sequence alignment is foundational to many bioinformatic analyses. Many aligners start by splitting sequences into contiguous, fixed-length seeds, called k-mers. Alignment is faster with longer, unique seeds, but more accurate with shorter seeds avoiding mutations.
View Article and Find Full Text PDFVaccines (Basel)
November 2024
School of Basic Medical Sciences, Anhui Medical University, Hefei 230032, China.
: The Middle East Respiratory Syndrome Coronavirus (MERS-CoV) is a highly pathogenic virus causing severe respiratory illness, with limited treatment options that are mostly supportive. The success of mRNA technology in COVID-19 vaccines has opened avenues for antibody development against MERS-CoV. mRNA-based antibodies, expressed in vivo, offer rapid adaptability to viral mutations while minimizing long-term side effects.
View Article and Find Full Text PDFSci Rep
December 2024
Laboratory of Cell Vaccine, Microbial Research Center for Health and Medicine (MRCHM), National Institutes of Biomedical Innovation, Health and Nutrition (NIBIOHN), 7-6-8 Saito-Asagi, Ibaraki-Shi, Osaka, 567-0085, Japan.
Since designer cells are attracting much attention as a new modality in gene and cell therapy, it would be advantageous to develop synthetic receptors that recognize artificial ligands and activate solely signaling molecules of interest. In this study, we refined the construction of our previously developed minimal engineered receptors (MERs) to avoid off-target activation of STAT5 while maintaining on-target activation of signaling molecules corresponding to tyrosine motifs. Among the myristoylated, cytoplasmic, and transmembrane types of MERs, the cytoplasmic type had the highest signaling efficiency, although there was off-target activation of STAT5 upon ligand stimulation.
View Article and Find Full Text PDFCurr Protoc
December 2024
Institute of Virology, Medical University of Innsbruck, Innsbruck, Austria.
Antiviral drugs are essential medications to save the lives of infected people. However, they are under constant threat to become ineffective as viruses evolve quickly. Studying the development of resistance is therefore paramount to understand the impact of mutations on pharmacological treatment and to make informed decisions.
View Article and Find Full Text PDFPLoS One
December 2024
Laboratory of Antibody Design, Center for Drug Design Research, National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan.
The SARS-CoV-2 pandemic alerted the potential for significant harm due to future cross-species transmission of various animal coronaviruses to human. There is a significant need of antibody-based drugs to treat patients infected with previously unseen coronaviruses. In this study, we generated CV804, an antibody that binds to the S2 domain of SARS-CoV-2 spike protein, which is highly conserved across the coronavirus family and less susceptible to mutations.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!