In Kinetoplastids, protein-coding genes are transcribed polycistronically by RNA polymerase II. Individual mature mRNAs are generated from polycistronic precursors by 5' trans splicing of a 39-nt capped leader RNA and 3' polyadenylation. It was previously known that trans splicing generally occurs at an AG dinucleotide downstream of a polypyrimidine tract, and that polyadenylation is coupled to downstream trans splicing. The few polyadenylation sites that had been examined were 100-400 nt upstream of the polypyrimidine tract which marked the adjacent trans splice site. We wished to define the sequence requirements for trypanosome mRNA processing more tightly and to generate a predictive algorithm. By scanning all available Trypanosoma brucei cDNAs for splicing and polyadenylation sites, we found that trans splicing generally occurs at the first AG following a polypyrimidine tract of 8-25 nt, giving rise to 5'-UTRs of a median length of 68 nt. We also found that in general, polyadenylation occurs at a position with one or more A residues located between 80 and 140 nt from the downstream polypyrimidine tract. These data were used to calibrate free parameters in a grammar model with distance constraints, enabling prediction of polyadenylation and trans splice sites for most protein-coding genes in the trypanosome genome. The data from the genome analysis and the program are available from: .
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.molbiopara.2005.05.008 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!