Proteogenomic database construction driven from large scale RNA-seq data.

J Proteome Res

Department of Electrical and Computing Engineering, ¶Department of Bioinformatics and Systems Biology, and §Department of Computer Science, University of California, San Diego , La Jolla, California 92093, United States.

Published: January 2014

The advent of inexpensive RNA-seq technologies and other deep sequencing technologies for RNA has the promise to radically improve genomic annotation, providing information on transcribed regions and splicing events in a variety of cellular conditions. Using MS-based proteogenomics, many of these events can be confirmed directly at the protein level. However, the integration of large amounts of redundant RNA-seq data and mass spectrometry data poses a challenging problem. Our paper addresses this by construction of a compact database that contains all useful information expressed in RNA-seq reads. Applying our method to cumulative C. elegans data reduced 496.2 GB of aligned RNA-seq SAM files to 410 MB of splice graph database written in FASTA format. This corresponds to 1000× compression of data size, without loss of sensitivity. We performed a proteogenomics study using the custom data set, using a completely automated pipeline, and identified a total of 4044 novel events, including 215 novel genes, 808 novel exons, 12 alternative splicings, 618 gene-boundary corrections, 245 exon-boundary changes, 938 frame shifts, 1166 reverse strands, and 42 translated UTRs. Our results highlight the usefulness of transcript + proteomic integration for improved genome annotations.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4034692PMC
http://dx.doi.org/10.1021/pr400294cDOI Listing

Publication Analysis

Top Keywords

rna-seq data
8
data
6
rna-seq
5
proteogenomic database
4
database construction
4
construction driven
4
driven large
4
large scale
4
scale rna-seq
4
data advent
4

Similar Publications

Cell clustering is an essential step in uncovering cellular architectures in single cell RNA-sequencing (scRNA-seq) data. However, the existing cell clustering approaches are not well designed to dissect complex structures of cellular landscapes at a finer resolution. Here, we develop a multi-scale clustering (MSC) approach to construct sparse cell-cell correlation network for identifying de novo cell types and subtypes at multiscale resolution in an unsupervised manner.

View Article and Find Full Text PDF

Typical high-throughput single-cell RNA-sequencing (scRNA-seq) analyses are primarily conducted by (pseudo)alignment, through the lens of annotated gene models, and aimed at detecting differential gene expression. This misses diversity generated by other mechanisms that diversify the transcriptome such as splicing and V(D)J recombination, and is blind to sequences missing from imperfect reference genomes. Here, we present sc-SPLASH, a highly efficient pipeline that extends our SPLASH framework for statistics-first, reference-free discovery to barcoded scRNA-seq (10x Chromium) and spatial transcriptomics (10x Visium); we also provide its optimized module for preprocessing and -mer counting in barcoded data, BKC, as a standalone tool.

View Article and Find Full Text PDF

Purpose: The development of endocrine resistance remains a significant challenge in the clinical management of estrogen receptor-positive ( ) breast cancer. Metabolic reprogramming is a prominent component of endocrine resistance and a potential therapeutic intervention point. However, a limited understanding of which metabolic changes are conserved across the heterogeneous landscape of ER+ breast cancer or how metabolic changes factor into ER DNA binding patterns hinder our ability to target metabolic adaptation as a treatment strategy.

View Article and Find Full Text PDF

Background: Bispecific T cell-engagers (BTEs) are engineered antibodies that redirect T cells to target antigen-expressing tumors. BTEs targeting various tumor-specific antigens, like interleukin 13 receptor alpha 2 (IL13RA2) and EGFRvIII, have been developed for glioblastoma (GBM). However, limited knowledge of BTE actions derived from studies conducted in immunocompromised animal models impedes progress in the field.

View Article and Find Full Text PDF

Comprehensive analysis of scRNA-seq and bulk RNA-seq reveals the non-cardiomyocytes heterogeneity and novel cell populations in dilated cardiomyopathy.

J Transl Med

January 2025

State Key Laboratory of Cardiovascular Diseases and Medical Innovation Center, School of Medicine, Shanghai East Hospital, Tongji University, Shanghai, 200120, China.

Background: Dilated cardiomyopathy (DCM) is one of the most common causes of heart failure. Infiltration and alterations in non-cardiomyocytes of the human heart involve crucially in the occurrence of DCM and associated immunotherapeutic approaches.

Methods: We constructed a single-cell transcriptional atlas of DCM and normal patients.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!