A linear-time algorithm for finding a maximum-length ORF in a splice graph.

Int J Comput Biol Drug Des

Department of Computer Science, University of Kentucky, Lexington, KY, USA.

Published: November 2012

We present a linear-time, deterministic algorithm for finding a longest Open Reading Frame (ORF) in an alternatively spliced gene represented by a splice graph. Finding protein-encoding regions is a fundamental problem in genomic and transcriptomic analysis, and in some circumstances long ORFs can provide good predictions of such regions. Splice graphs are a common way of compactly representing what may be exponentially many alternative splicings of a sequence. The efficiency of our algorithm is achieved by pruning the search space so as to bound the number of reading frames considered at any vertex of the splice graph. The algorithm guarantees that the unpruned reading frames contain at least one longest ORF of the gene. We are therefore able to find a longest ORF among all splice variants in time linear in the size of the splice graph, even though the number of potential transcripts may be much larger.

Download full-text PDF

Source
http://dx.doi.org/10.1504/IJCBDD.2012.049212DOI Listing

Publication Analysis

Top Keywords

splice graph
16
algorithm finding
8
orf splice
8
reading frames
8
longest orf
8
splice
6
linear-time algorithm
4
finding maximum-length
4
orf
4
maximum-length orf
4

Similar Publications

Proteoform Identification and Quantification Based on Alignment Graphs.

Bioinformatics

January 2025

Department of Computer Science, City University of Hong Kong, Hong Kong, China.

Motivation: Proteoforms are the different forms of a proteins generated from the genome with various sequence variations, splice isoforms, and post-translational modifications. Proteoforms regulate protein structures and functions. A single protein can have multiple proteoforms due to different modification sites.

View Article and Find Full Text PDF

The human heterogeneous nuclear ribonucleoprotein (hnRNP) A1 is a prototypical RNA-binding protein essential in regulating a wide range of post-transcriptional events in cells. As a multifunctional protein with a key role in RNA metabolism, deregulation of its functions has been linked to neurodegenerative diseases, tumour aggressiveness and chemoresistance, which has fuelled efforts to develop novel therapeutics that modulates its RNA binding activities. Here, using a combination of Molecular Dynamics (MD) simulations and graph neural network pockets predictions, we showed that hnRNPA1 N-terminal RNA binding domain (UP1) contains several cryptic pockets capable of binding small molecules.

View Article and Find Full Text PDF

RNA velocity, as an extension of trajectory inference, is an effective method for understanding cell development using single-cell RNA sequencing (scRNA-seq) experiments. However, existing RNA velocity methods are limited by the batch effect because they cannot directly correct for batch effects in the input data, which comprises spliced and unspliced matrices in a proportional relationship. This limitation can lead to an incorrect velocity stream.

View Article and Find Full Text PDF

CrossIsoFun: predicting isoform functions using the integration of multi-omics data.

Bioinformatics

December 2024

School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China.

Motivation: Isoforms spliced from the same gene may carry distinct biological functions. Therefore, annotating functions at the isoform level provides valuable insights into the functional diversity of genomes. Since experimental approaches for determining isoform functions are time- and cost-demanding, computational methods have been proposed.

View Article and Find Full Text PDF

GraphVelo allows for accurate inference of multimodal velocities and molecular mechanisms for single cells.

bioRxiv

January 2025

Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA.

RNA velocities and generalizations emerge as powerful approaches for extracting time-resolved information from high-throughput snapshot single-cell data. Yet, several inherent limitations restrict applying the approaches to genes not suitable for RNA velocity inference due to complex transcriptional dynamics, low expression, or lacking splicing dynamics, or data of non-transcriptomic modality. Here, we present GraphVelo, a graph-based machine learning procedure that uses as input the RNA velocities inferred from existing methods and infers velocity vectors lying in the tangent space of the low-dimensional manifold formed by the single cell data.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!