Sequences derived from organisms sharing common evolutionary origins exhibit similarity, while unique sequences, absent in related organisms, act as good diagnostic marker candidates. However, the approach focused on identifying dissimilar regions among closely-related organisms poses challenges as it requires complex multiple sequence alignments, making computation and parsing difficult. To address this, we have developed a biologically inspired universal NAUniSeq algorithm to find the unique sequences for microorganism diagnosis by traveling through the phylogeny of life. Mapping through a phylogenetic tree ensures a low number of cross-contamination and false positives. We have downloaded complete taxonomy data from Taxadb database and sequence data from National Center for Biotechnology Information Reference Sequence Database (NCBI-Refseq) and, with the help of NetworkX, created a phylogenetic tree. Sequences were assigned over the graph nodes, k-mers were created for target and non-target nodes and search was performed over the graph using the depth first search algorithm. In a memory efficient alternative NoSQL approach, we created a collection of Refseq sequences in MongoDB database using tax-id and path of FASTA files. We queried the MongoDB collection for the target and non-target sequences. In both the approaches, we used an alignment free sliding window k-mer-based procedure that quickly compares k-mers of target and non-target sequences and returns unique sequences that are not present in the non-target. We have validated our algorithm with target nodes Mycobacterium tuberculosis, Neisseria gonorrhoeae, and Monkeypox and generated unique sequences. This universal algorithm is a powerful tool for generating diagnostic sequences, enabling the accurate identification of microbial strains with high phylogenetic precision.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11497845PMC
http://dx.doi.org/10.1093/bib/bbae545DOI Listing

Publication Analysis

Top Keywords

unique sequences
20
target non-target
12
sequences
11
algorithm find
8
find unique
8
phylogenetic tree
8
non-target sequences
8
algorithm
5
unique
5
advancing microbial
4

Similar Publications

CD56 CD16 cells represent a distinct mature NK cell subset with altered phenotype and are associated with adverse clinical outcome upon expansion in AML.

Front Immunol

January 2025

Team Immunity and Cancer, Cancer Research Center of Marseille (CRCM), Inserm U1068, CNRS UMR7258, Paoli-Calmettes Institute, University of Aix-Marseille UM105, Marseille, France.

Introduction: Acute myeloid leukemia (AML) is a rare haematological cancer with poor 5-years overall survival (OS) and high relapse rate. Leukemic cells are sensitive to Natural Killer (NK) cell mediated killing. However, NK cells are highly impaired in AML, which promote AML immune escape from NK cell immune surveillance.

View Article and Find Full Text PDF

Targeted barcoding of variable antibody domains and individual transcriptomes of the human B-cell repertoire using Link-Seq.

PNAS Nexus

January 2025

Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland.

Here, we present Link-Seq, a highly efficient droplet microfluidic method for combined sequencing of antibody-encoding genes and the transcriptome of individual B cells at large scale. The method is based on 3' barcoding of the transcriptome and subsequent single-molecule PCR in droplets, which freely shift the barcode along specific gene regions, such as the antibody heavy- and light-chain genes. Using the immune repertoire of COVID-19 patients and healthy donors as a model system, we obtain up to 91.

View Article and Find Full Text PDF

The Role of Structural Flexibility in Hydrocarbon-Stapled Peptides Designed to Block Viral Infection via Human ACE2 Mimicry.

Pept Sci (Hoboken)

November 2024

Department of Pediatrics, Section of Hematology/Oncology, The University of Chicago, Chicago, Illinois 60637, United States of America.

The COVID-19 pandemic drove a uniquely fervent pursuit to explore the potential of peptide, antibody, protein, and small-molecule based antiviral agents against severe acute respiratory syndrome-coronavirus 2 (SARS-CoV-2). The interaction between the SARS-CoV2 spike protein with the angiotensin-converting enzyme 2 (ACE2) receptor that mediates viral cell entry was a particularly interesting target given its well described protein-protein interaction (PPI). This PPI is mediated by an α-helical portion of ACE2 binding to the receptor binding domain (RBD) of the spike protein and thought to be susceptible to blockade through molecular mimicry.

View Article and Find Full Text PDF

The presence of the clustered regularly interspaced short palindromic repeats (CRISPR)/Cas system in the superbug presents a unique opportunity to precisely target and edit bacterial genomes to modify their drug resistance. The objective was to detect the prevalence of CRISPR in extensively and pan-drug-resistant and to determine the utility of whole-genome sequencing (WGS) for the analysis of the entire genome for such strains. The antimicrobial susceptibilities of one hundred isolates were assessed using the antibiotic susceptibility test (AST) card of the VITEK system.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!