SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics.

Bioinformatics

Bioinformatics, Department of Computer Science, University of Freiburg, Freiburg, Germany, Centre for Biological Systems Analysis (ZBSA), University of Freiburg, Freiburg, Germany, Centre for Non-coding RNA in Technology and Health, University of Copenhagen, Copenhagen, Denmark and Centre for Biological Signalling Studies (BIOSS), University of Freiburg, Freiburg, Germany.

Published: August 2015

Motivation: RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of [Formula: see text]. Subsequently, numerous faster 'Sankoff-style' approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity ([Formula: see text] quartic time).

Results: Breaking this barrier, we introduce the novel Sankoff-style algorithm 'sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)', which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff's original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4514930PMC
http://dx.doi.org/10.1093/bioinformatics/btv185DOI Listing

Publication Analysis

Top Keywords

sequence-based heuristics
20
quadratic time
8
simultaneous alignment
8
alignment folding
8
complexity [formula
8
[formula text]
8
lightweight energy
8
alignment
6
sequence-based
6
sparse
5

Similar Publications

Angiogenesis is a key process for the proliferation and metastatic spread of cancer cells. Anti-angiogenic peptides (AAPs), with the capability of inhibiting angiogenesis, are promising candidates in cancer treatment. We propose AAPL, a sequence-based predictor to identify AAPs with machine learning models of improved prediction accuracy.

View Article and Find Full Text PDF

Mitochondrial gene order has contributed to the elucidation of evolutionary relationships in several animal groups. It generally has found its application as a phylogenetic marker for deep nodes. Yet, in Orthoptera limited research has been performed on the gene order, although the group represents one of the oldest insect orders.

View Article and Find Full Text PDF

DNA replication initiation is a complex process involving various genetic and epigenomic signatures. The correct identification of replication origins (ORIs) could provide important clues for the study of a variety of diseases caused by replication. Here, we design a computational approach named iORI-Epi to recognize ORIs by incorporating epigenome-based features, sequence-based features, and 3D genome-based features.

View Article and Find Full Text PDF

Prediction of Variable-Length B-Cell Epitopes for Antipeptide Paratopes Using the Program HAPTIC.

Protein Pept Lett

June 2022

Biomedical Innovations Research for Translational Health Science (BIRTHS) Laboratory, Department of Biochemistry and Molecular Biology, College of Medicine, University of the Philippines Manila, Manila, Philippines.

Background: B-cell epitope prediction for antipeptide antibody responses enables peptide-based vaccine design and related translational applications. This entails estimating epitopeparatope binding free-energy changes from antigen sequence; but attempts to do so assuming uniform epitope length (e.g.

View Article and Find Full Text PDF

The correct targeting and insertion of tail-anchored (TA) integral membrane proteins is critical for cellular homeostasis. TA proteins are defined by a hydrophobic transmembrane domain (TMD) at their C-terminus and are targeted to either the ER or mitochondria. Derived from experimental measurements of a few TA proteins, there has been little examination of the TMD features that determine localization.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!