As the cost of genome sequencing continues to drop, comparison of large sequences becomes tantamount to our understanding of evolution and gene function. Rapid genome alignment stands to play a fundamental role in furthering biological understanding. In 2002, a fast algorithm based on statistical estimation called super pairwise alignment (SPA) was developed by Shen et al. The method was proved to be much faster than traditional dynamic programming algorithms, while it suffered small drop in accuracy. In this paper, we propose a new method based on SPA that target analysis of large-scale genomes. The new method, named super genome alignment (SGA), applies Yang-Keiffer coding theory to alignment and results in a grammar-based algorithm. SGA has the same computational complexity as its predecessor SPA, and it can process large-scale genomes. SGA is tested by using numerous pairs of microbial and eukaryotic genomes, which serve as the benchmark to compare it with the competing BLASTZ method. When compared with BLASTZ, the result shows that SGA is significantly faster by at least an order of magnitude (for some genome pairs the differences is as large at two orders of magnitude), and suffers on average only about 1% loss of the similarity of alignment.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.cmpb.2006.12.007 | DOI Listing |
Typical high-throughput single-cell RNA-sequencing (scRNA-seq) analyses are primarily conducted by (pseudo)alignment, through the lens of annotated gene models, and aimed at detecting differential gene expression. This misses diversity generated by other mechanisms that diversify the transcriptome such as splicing and V(D)J recombination, and is blind to sequences missing from imperfect reference genomes. Here, we present sc-SPLASH, a highly efficient pipeline that extends our SPLASH framework for statistics-first, reference-free discovery to barcoded scRNA-seq (10x Chromium) and spatial transcriptomics (10x Visium); we also provide its optimized module for preprocessing and -mer counting in barcoded data, BKC, as a standalone tool.
View Article and Find Full Text PDFRNA-Seq analysis has become a routine task in numerous genomic research labs, driven by the reduced cost of bulk RNA sequencing experiments. These generate billions of reads that require accurate, efficient, effective, and reproducible analysis. But the time required for comprehensive analysis remains a bottleneck.
View Article and Find Full Text PDFBacterial genomes exhibit significant variation in gene content and sequence identity. Pangenome analyses explore this diversity by classifying genes into core and accessory clusters of orthologous groups (COGs). However, strict sequence identity cutoffs can misclassify divergent alleles as different genes, inflating accessory gene counts.
View Article and Find Full Text PDFWe investigated small non-coding RNAs (sncRNAs) from the prefrontal cortex of 93 individuals diagnosed with schizophrenia (SCZ) or bipolar disorder (BD) and 77 controls. We uncovered recurring complex sncRNA profiles, with 98% of all sncRNAs being accounted for by miRNA isoforms (60.6%), tRNA-derived fragments (17.
View Article and Find Full Text PDFMol Breed
January 2025
Maize Research Institute, Guangxi Academy of Agricultural Sciences, Nanning, 530007 Guangxi China.
Unlabelled: Increasing planting density is one of the most important strategies for generating higher maize yields. Moderate leaf rolling decreases mutual shading of leaves and increases the photosynthesis of the population and hence increases the tolerance for high-density planting. Few genes that control leaf rolling in maize have been identified, however, and their applicability for breeding programs remains unclear.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!