A common class of transcripts with 5'-intron depletion, distinct early coding sequence features, and -methyladenosine modification.

Can Cenik Hon Nian Chua Guramrit Singh Abdalla Akef Michael P Snyder Alexander F Palazzo Melissa J Moore Frederick P Roth

RNA

Donnelly Centre, Department of Molecular Genetics, and Department of Computer Science, University of Toronto, Toronto M5S 3E1, Ontario, Canada

Published: March 2017

- Introns in the 5' untranslated regions (5'UTRs) are present in about 35% of human transcripts, but certain genes—like those coding for secreted and membrane-bound proteins—are less likely to have them.
- A classifier developed for this study predicts the status of 5'UTR introns with over 80% accuracy based on early coding region sequences, identifying a group of transcripts termed "5IM" that show distinct characteristics.
- Transcripts in the 5IM class, which make up around 20% of human transcripts, exhibit specific features like non-AUG start codons, enriched secondary structures, greater reliance on translation factors, and unique binding patterns, indicating a

Introns are found in 5' untranslated regions (5'UTRs) for 35% of all human transcripts. These 5'UTR introns are not randomly distributed: Genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5'UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5'UTR intron status, we developed a classifier that can predict 5'UTR intron status with >80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with ' proximal-ntron-inus-like-coding regions ("5IM" transcripts). Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. The 5IM class of transcripts is enriched for non-AUG start codons, more extensive secondary structure both preceding the start codon and near the 5' cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts are bound by the exon junction complex (EJC) at noncanonical 5' proximal positions. Finally, -methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses point to the existence of a distinct 5IM class comprising ∼20% of human transcripts. This class is defined by depletion of 5' proximal introns, presence of specific RNA sequence features associated with low translation efficiency, -methyladenosines in the early coding region, and enrichment for noncanonical binding by the EJC.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5311483	PMC
http://dx.doi.org/10.1261/rna.059105.116	DOI Listing

Publication Analysis

Top Keywords

early coding

sequence features

5im transcripts

transcripts

class transcripts

coding sequence

human transcripts

5'utr introns

specific rna

rna sequence

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!