The Statistics of -mers from a Sequence Undergoing a Simple Mutation Process Without Spurious Matches.

J Comput Biol

Department of Computer Science and Engineering, and The Pennsylvania State University, University Park, Pennsylvania, USA.

Published: February 2022

AI Article Synopsis

  • -mer-based methods in bioinformatics are commonly used, but their statistical properties are not fully understood, especially regarding mutation processes in sequences.
  • The study derives the expectation and variance for mutated -mers, islands (intervals of mutated -mers), and oceans (intervals of nonmutated -mers), providing key statistical insights.
  • The findings include hypothesis tests and confidence intervals for analyzing mutated -mers, and they showcase practical applications such as improving estimates in Mash distance, enhancing read alignment with Minimap2, and evaluating long-read alignments with Jabba.

Article Abstract

-mer-based methods are widely used in bioinformatics, but there are many gaps in our understanding of their statistical properties. Here, we consider the simple model where a sequence (e.g., a genome or a read) undergoes a simple mutation process through which each nucleotide is mutated independently with some probability , under the assumption that there are no spurious -mer matches. How does this process affect the -mers of ? We derive the expectation and variance of the number of mutated -mers and of the number of islands (a maximal interval of mutated -mers) and oceans (a maximal interval of nonmutated -mers). We then derive hypothesis tests and confidence intervals (CIs) for given an observed number of mutated -mers, or, alternatively, given the Jaccard similarity (with or without MinHash). We demonstrate the usefulness of our results using a few select applications: obtaining a CI to supplement the Mash distance point estimate, filtering out reads during alignment by Minimap2, and rating long-read alignments to a de Bruijn graph by Jabba.

Download full-text PDF

Source
http://dx.doi.org/10.1089/cmb.2021.0431DOI Listing

Publication Analysis

Top Keywords

mutated -mers
12
simple mutation
8
mutation process
8
-mers derive
8
number mutated
8
maximal interval
8
-mers
5
statistics -mers
4
-mers sequence
4
sequence undergoing
4

Similar Publications

Sequence alignment is foundational to many bioinformatic analyses. Many aligners start by splitting sequences into contiguous, fixed-length seeds, called k-mers. Alignment is faster with longer, unique seeds, but more accurate with shorter seeds avoiding mutations.

View Article and Find Full Text PDF

: The Middle East Respiratory Syndrome Coronavirus (MERS-CoV) is a highly pathogenic virus causing severe respiratory illness, with limited treatment options that are mostly supportive. The success of mRNA technology in COVID-19 vaccines has opened avenues for antibody development against MERS-CoV. mRNA-based antibodies, expressed in vivo, offer rapid adaptability to viral mutations while minimizing long-term side effects.

View Article and Find Full Text PDF

Refining minimal engineered receptors for specific activation of on-target signaling molecules.

Sci Rep

December 2024

Laboratory of Cell Vaccine, Microbial Research Center for Health and Medicine (MRCHM), National Institutes of Biomedical Innovation, Health and Nutrition (NIBIOHN), 7-6-8 Saito-Asagi, Ibaraki-Shi, Osaka, 567-0085, Japan.

Since designer cells are attracting much attention as a new modality in gene and cell therapy, it would be advantageous to develop synthetic receptors that recognize artificial ligands and activate solely signaling molecules of interest. In this study, we refined the construction of our previously developed minimal engineered receptors (MERs) to avoid off-target activation of STAT5 while maintaining on-target activation of signaling molecules corresponding to tyrosine motifs. Among the myristoylated, cytoplasmic, and transmembrane types of MERs, the cytoplasmic type had the highest signaling efficiency, although there was off-target activation of STAT5 upon ligand stimulation.

View Article and Find Full Text PDF

Antiviral drugs are essential medications to save the lives of infected people. However, they are under constant threat to become ineffective as viruses evolve quickly. Studying the development of resistance is therefore paramount to understand the impact of mutations on pharmacological treatment and to make informed decisions.

View Article and Find Full Text PDF

The SARS-CoV-2 pandemic alerted the potential for significant harm due to future cross-species transmission of various animal coronaviruses to human. There is a significant need of antibody-based drugs to treat patients infected with previously unseen coronaviruses. In this study, we generated CV804, an antibody that binds to the S2 domain of SARS-CoV-2 spike protein, which is highly conserved across the coronavirus family and less susceptible to mutations.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!