ORMAN: optimal resolution of ambiguous RNA-Seq multimappings in the presence of novel isoforms.

Bioinformatics

School of Computing Science, Simon Fraser University, Burnaby, BC, Canada, Department of Genome Sciences, University of Washington, Seattle, WA, USA, Vancouver Prostate Centre & Department of Urologic Sciences, University of British Columbia, Vancouver, BC, Canada and Division of Computer Science, School of Informatics and Computing, Indiana University, Bloomington, IN, USA.

Published: March 2014

Motivation: RNA-Seq technology is promising to uncover many novel alternative splicing events, gene fusions and other variations in RNA transcripts. For an accurate detection and quantification of transcripts, it is important to resolve the mapping ambiguity for those RNA-Seq reads that can be mapped to multiple loci: >17% of the reads from mouse RNA-Seq data and 50% of the reads from some plant RNA-Seq data have multiple mapping loci. In this study, we show how to resolve the mapping ambiguity in the presence of novel transcriptomic events such as exon skipping and novel indels towards accurate downstream analysis. We introduce ORMAN ( O ptimal R esolution of M ultimapping A mbiguity of R N A-Seq Reads), which aims to compute the minimum number of potential transcript products for each gene and to assign each multimapping read to one of these transcripts based on the estimated distribution of the region covering the read. ORMAN achieves this objective through a combinatorial optimization formulation, which is solved through well-known approximation algorithms, integer linear programs and heuristics.

Results: On a simulated RNA-Seq dataset including a random subset of transcripts from the UCSC database, the performance of several state-of-the-art methods for identifying and quantifying novel transcripts, such as Cufflinks, IsoLasso and CLIIQ, is significantly improved through the use of ORMAN. Furthermore, in an experiment using real RNA-Seq reads, we show that ORMAN is able to resolve multimapping to produce coverage values that are similar to the original distribution, even in genes with highly non-uniform coverage.

Availability: ORMAN is available at http://orman.sf.net

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btt591DOI Listing

Publication Analysis

Top Keywords

presence novel
8
resolve mapping
8
mapping ambiguity
8
rna-seq reads
8
rna-seq data
8
rna-seq
7
orman
6
novel
5
transcripts
5
reads
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!