SGA: a grammar-based alignment algorithm.

Comput Methods Programs Biomed

College of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, PR China.

Published: April 2007

As the cost of genome sequencing continues to drop, comparison of large sequences becomes tantamount to our understanding of evolution and gene function. Rapid genome alignment stands to play a fundamental role in furthering biological understanding. In 2002, a fast algorithm based on statistical estimation called super pairwise alignment (SPA) was developed by Shen et al. The method was proved to be much faster than traditional dynamic programming algorithms, while it suffered small drop in accuracy. In this paper, we propose a new method based on SPA that target analysis of large-scale genomes. The new method, named super genome alignment (SGA), applies Yang-Keiffer coding theory to alignment and results in a grammar-based algorithm. SGA has the same computational complexity as its predecessor SPA, and it can process large-scale genomes. SGA is tested by using numerous pairs of microbial and eukaryotic genomes, which serve as the benchmark to compare it with the competing BLASTZ method. When compared with BLASTZ, the result shows that SGA is significantly faster by at least an order of magnitude (for some genome pairs the differences is as large at two orders of magnitude), and suffers on average only about 1% loss of the similarity of alignment.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cmpb.2006.12.007DOI Listing

Publication Analysis

Top Keywords

genome alignment
8
large-scale genomes
8
alignment
6
sga
5
sga grammar-based
4
grammar-based alignment
4
alignment algorithm
4
algorithm cost
4
genome
4
cost genome
4

Similar Publications

Typical high-throughput single-cell RNA-sequencing (scRNA-seq) analyses are primarily conducted by (pseudo)alignment, through the lens of annotated gene models, and aimed at detecting differential gene expression. This misses diversity generated by other mechanisms that diversify the transcriptome such as splicing and V(D)J recombination, and is blind to sequences missing from imperfect reference genomes. Here, we present sc-SPLASH, a highly efficient pipeline that extends our SPLASH framework for statistics-first, reference-free discovery to barcoded scRNA-seq (10x Chromium) and spatial transcriptomics (10x Visium); we also provide its optimized module for preprocessing and -mer counting in barcoded data, BKC, as a standalone tool.

View Article and Find Full Text PDF

RNA-Seq analysis has become a routine task in numerous genomic research labs, driven by the reduced cost of bulk RNA sequencing experiments. These generate billions of reads that require accurate, efficient, effective, and reproducible analysis. But the time required for comprehensive analysis remains a bottleneck.

View Article and Find Full Text PDF

Bacterial genomes exhibit significant variation in gene content and sequence identity. Pangenome analyses explore this diversity by classifying genes into core and accessory clusters of orthologous groups (COGs). However, strict sequence identity cutoffs can misclassify divergent alleles as different genes, inflating accessory gene counts.

View Article and Find Full Text PDF

We investigated small non-coding RNAs (sncRNAs) from the prefrontal cortex of 93 individuals diagnosed with schizophrenia (SCZ) or bipolar disorder (BD) and 77 controls. We uncovered recurring complex sncRNA profiles, with 98% of all sncRNAs being accounted for by miRNA isoforms (60.6%), tRNA-derived fragments (17.

View Article and Find Full Text PDF

Unlabelled: Increasing planting density is one of the most important strategies for generating higher maize yields. Moderate leaf rolling decreases mutual shading of leaves and increases the photosynthesis of the population and hence increases the tolerance for high-density planting. Few genes that control leaf rolling in maize have been identified, however, and their applicability for breeding programs remains unclear.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!