A genome alignment algorithm based on compression.

BMC Bioinformatics

Clayton School of Information Technology, Monash University, Clayton 3800, Australia.

Published: December 2010

Background: Traditional genome alignment methods consider sequence alignment as a variation of the string edit distance problem, and perform alignment by matching characters of the two sequences. They are often computationally expensive and unable to deal with low information regions. Furthermore, they lack a well-principled objective function to measure the performance of sets of parameters. Since genomic sequences carry genetic information, this article proposes that the information content of each nucleotide in a position should be considered in sequence alignment. An information-theoretic approach for pairwise genome local alignment, namely XMAligner, is presented. Instead of comparing sequences at the character level, XMAligner considers a pair of nucleotides from two sequences to be related if their mutual information in context is significant. The information content of nucleotides in sequences is measured by a lossless compression technique.

Results: Experiments on both simulated data and real data show that XMAligner is superior to conventional methods especially on distantly related sequences and statistically biased data. XMAligner can align sequences of eukaryote genome size with only a modest hardware requirement. Importantly, the method has an objective function which can obviate the need to choose parameter values for high quality alignment. The alignment results from XMAligner can be integrated into a visualisation tool for viewing purpose.

Conclusions: The information-theoretic approach for sequence alignment is shown to overcome the mentioned problems of conventional character matching alignment methods. The article shows that, as genomic sequences are meant to carry information, considering the information content of nucleotides is helpful for genomic sequence alignment.

Availability: Downloadable binaries, documentation and data can be found at ftp://ftp.infotech.monash.edu.au/software/DNAcompress-XM/XMAligner/.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3022628PMC
http://dx.doi.org/10.1186/1471-2105-11-599DOI Listing

Publication Analysis

Top Keywords

sequence alignment
12
alignment
9
genome alignment
8
alignment methods
8
sequences
8
objective function
8
genomic sequences
8
information-theoretic approach
8
alignment xmaligner
8
nucleotides sequences
8

Similar Publications

Purpose: Pulmonary MRI faces challenges due to low proton density, rapid transverse magnetization decay, and cardiac and respiratory motion. The fermat-looped orthogonally encoded trajectories (FLORET) sequence addresses these issues with high sampling efficiency, strong signal, and motion robustness, but has not yet been applied to phase-resolved functional lung (PREFUL) MRI-a contrast-free method for assessing pulmonary ventilation during free breathing. This study aims to develop a reconstruction pipeline for FLORET UTE, enhancing spatial resolution for three-dimensional (3D) PREFUL ventilation analysis.

View Article and Find Full Text PDF

Decoding the mA epitranscriptomic landscape for biotechnological applications using a direct RNA sequencing approach.

Nat Commun

January 2025

National-Local Joint Engineering Laboratory of Druggability and New Drug Evaluation, National Engineering Research Center for New Drug and Druggability (cultivation), Guangdong Province Key Laboratory of New Drug Design and Evaluation, School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou, 510006, China.

Epitranscriptomic modifications, particularly N6-methyladenosine (mA), are crucial regulators of gene expression, influencing processes such as RNA stability, splicing, and translation. Traditional computational methods for detecting mA from Nanopore direct RNA sequencing (DRS) data are constrained by their reliance on experimentally validated labels, often resulting in the underestimation of modification sites. Here, we introduce pum6a, an innovative attention-based framework that integrates positive and unlabeled multi-instance learning (MIL) to address the challenges of incomplete labeling and missing read-level annotations.

View Article and Find Full Text PDF

Structure and catalytic activity of a dihydrofolate reductase-like enzyme from Leptospira interrogans.

Int J Biol Macromol

January 2025

Center of Excellence for Molecular Biology and Genomics of Shrimp, Department of Biochemistry, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand; Center of Excellence in Molecular Crop, Department of Biochemistry, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand. Electronic address:

A dihydrofolate reductase (DHFR)-like enzyme from Leptospira interrogans (LiDHFRL) was cloned and the recombinant protein was characterized. Sequence alignment suggested that the enzyme lacked the conserved catalytic residues found in DHFR. Indeed, LiDHFRL did not catalyze the reduction of dihydrofolate by either NADH or NADPH.

View Article and Find Full Text PDF

The novel allele HLA-DPB1*1617:01 differs from HLA-DPB1*05:01:01:01 by one non-synonymous nucleotide substitution in exon 2.

View Article and Find Full Text PDF

Novel Allele HLA-B*52:130, Identified by Next-Generation Sequencing.

HLA

January 2025

Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Pirogov Medical University, Moscow, Russia.

The new HLA-B*52:130 allele showed one nonsynonymous nucleotide difference compared to the HLA-B*52:01:01:01 allele in codon 170.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!