Motivation: A consensus sequence for a family of related sequences is, as the name suggests, a sequence that captures the features common to most members of the family. Consensus sequences are important in various DNA sequencing applications and are a convenient way to characterize a family of molecules.

Results: This paper describes a new algorithm for finding a consensus sequence, using the popular optimization method known as simulated annealing. Unlike the conventional approach of finding a consensus sequence by first forming a multiple sequence alignment, this algorithm searches for a sequence that minimises the sum of pairwise distances to each of the input sequences. The resulting consensus sequence can then be used to induce a multiple sequence alignment. The time required by the algorithm scales linearly with the number of input sequences and quadratically with the length of the consensus sequence. We present results demonstrating the high quality of the consensus sequences and alignments produced by the new algorithm. For comparison, we also present similar results obtained using ClustalW. The new algorithm outperforms ClustalW in many cases.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/18.11.1494DOI Listing

Publication Analysis

Top Keywords

consensus sequence
20
finding consensus
12
consensus sequences
12
sequence
9
simulated annealing
8
algorithm finding
8
consensus
8
multiple sequence
8
sequence alignment
8
input sequences
8

Similar Publications

Exploratory analysis of single-cell RNA sequencing (scRNA-seq) typically relies on hard clustering over two-dimensional projections like uniform manifold approximation and projection (UMAP). However, such methods can severely distort the data and have many arbitrary parameter choices. Methods that can model scRNA-seq data as non-discrete "gene expression programs" (GEPs) can better preserve the data's structure, but currently, they are often not scalable, not consistent across repeated runs, and lack an established method for choosing key parameters.

View Article and Find Full Text PDF

Transthoracic echocardiography plays a crucial role in clinical diagnosis and is increasingly being used around the world. Comprehensive echocardiographic examinations require accurate measurements and the operators to have excellent technical skills. Despite the availability of several published echocardiographic guidelines, the absence of recommended operational manuals in daily practice has resulted in significant variation in the content of echocardiography reports across different medical institutions.

View Article and Find Full Text PDF

The increasingly widespread application of next-generation sequencing (NGS) in clinical diagnostics and epidemiological research has generated a demand for robust, fast, automated, and user-friendly bioinformatics workflows. To guide the choice of tools for the assembly of full-length viral genomes from NGS datasets, we assessed the performance and applicability of four open-source bioinformatics pipelines (shiver-for which we created a user-friendly Dockerized version, referred to as dshiver; SmaltAlign; viral-ngs; and V-pipe) using both simulated and real-world HIV-1 paired-end short-read datasets and default settings. All four pipelines produced consensus genome assemblies with high quality metrics (genome fraction recovery, mismatch and indel rates, variant calling F1 scores) when the reference sequence used for assembly had high similarity to the analyzed sample.

View Article and Find Full Text PDF

Inflammatory Response of THP1 and U937 Cells: The RNAseq Approach.

Cells

December 2024

Department of Oral Biology, University Clinic of Dentistry, Medical University of Vienna, 1090 Vienna, Austria.

THP1 and U937 are monocytic cell lines that are common bioassays to reflect monocyte and macrophage activities in inflammation research. However, THP-1 is a human monocytic leukemia cell line, and U937 originates from pleural effusion of histiocytic lymphoma; thus, even though they serve as bioassay in inflammation research, their response to agonists is not identical. Consequently, there has yet to be a consensus about the panel of strongly regulated genes in THP1 and U937 cells representing the inflammatory response to LPS and IFNG.

View Article and Find Full Text PDF

Background/objectives: Recent progress in evolutionary genomics on human () populations has revealed complex demographic events and genomic changes. These include population expansion with complicated migration, substantial population structure, and ancient introgression from other hominins, as well as human characteristics selections. Nevertheless, the genomic regions in which such evolutionary events took place have remained unclear.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!