Background: Polyadenylation is present in all three domains of life, making it the most conserved post-transcriptional process compared with splicing and 5'-capping. Even though most mammalian poly(A) sites contain a highly conserved hexanucleotide in the upstream region and a far less conserved U/GU-rich sequence in the downstream region, there are many exceptions. Furthermore, poly(A) sites in other species, such as plants and invertebrates, exhibit high deviation from this genomic structure, making the construction of a general poly(A) site recognition model challenging. We surveyed nine poly(A) site prediction methods published between 1999 and 2011. All methods exploit the skewed nucleotide profile across the poly(A) sites, and the highly conserved poly(A) signal as the primary features for recognition. These methods typically use a large number of features, which increases the dimensionality of the models to crippling degrees, and typically are not validated against many kinds of genomes.
Results: We propose a poly(A) site model that employs minimal features to capture the essence of poly(A) sites, and yet, produces better prediction accuracy across diverse species. Our model consists of three dior-trinucleotide profiles identified through principle component analysis, and the predicted nucleosome occupancy flanking the poly(A) sites. We validated our model using two machine learning methods: logistic regression and linear discriminant analysis. Results show that models achieve 85-92% sensitivity and 85-96% specificity in seven animals and plants. When we applied one model from one species to predict poly(A) sites from other species, the sensitivity scores correlate with phylogenetic distances.
Conclusions: A four-feature model geared towards small motifs was sufficient to accurately learn and predict poly(A) sites across eukaryotes.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3549828 | PMC |
http://dx.doi.org/10.1186/1471-2105-14-S2-S9 | DOI Listing |
Commun Biol
January 2025
Department of Chemistry, Merkert Chemistry Center, Boston College, Chestnut Hill, MA, USA.
Pseudouridine (Ψ) is an abundant RNA chemical modification that plays critical biological functions. Current Ψ detection methods are limited in identifying Ψs at base-resolution in U-rich sequence contexts, where Ψ occurs frequently. Here we report "Mut-Ψ-seq" that utilizes the classic N-cyclohexyl N'-(2-morpholinoethyl)carbodiimide (CMC) agent and an evolved reverse transcriptase ("RT-1306") for Ψ mapping at base-resolution.
View Article and Find Full Text PDFBMC Biol
December 2024
Life Science Research Centre, Faculty of Science, University of Ostrava, Ostrava, 710 00, Czech Republic.
Background: In trypanosomatids, a group of unicellular eukaryotes that includes numerous important human parasites, cis-splicing has been previously reported for only two genes: a poly(A) polymerase and an RNA helicase. Conversely, trans-splicing, which involves the attachment of a spliced leader sequence, is observed for nearly every protein-coding transcript. So far, our understanding of splicing in this protistan group has stemmed from the analysis of only a few medically relevant species.
View Article and Find Full Text PDFLife Sci Alliance
February 2025
Department of Biochemistry, University of Toronto, Toronto, Canada
In humans, misprocessed mRNAs containing intact 5' Splice Site (5'SS) motifs are nuclear retained and targeted for decay by ZFC3H1, a component of the Poly(A) Exosome Targeting complex, and U1-70K, a component of the U1 snRNP. In , the ZFC3H1 homolog, Red1, binds to the YTH domain-containing protein Mmi1 and targets certain RNA transcripts to nuclear foci for nuclear retention and decay. Here we show that YTHDC1 and YTHDC2, two YTH domain-containing proteins that bind to -6-methyladenosine (m6A) modified RNAs, interact with ZFC3H1 and U1-70K, and are required for the nuclear retention of mRNAs with intact 5'SS motifs.
View Article and Find Full Text PDFBMC Plant Biol
November 2024
School of Life Sciences and State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.
Background: Long non-coding RNAs (lncRNAs) play important roles in various biological processes, including stage development in plants. N-methyladenosine (mA) modification and polyadenylation are noteworthy regulatory processes that impact transcript functions by modulating their abundance. However, the specific landscapes of mA modification and polyadenylation on lncRNAs remain largely unexplored.
View Article and Find Full Text PDFGenome Res
November 2024
Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, New York 10065, USA;
RNA isoform diversity, produced via alternative splicing, and alternative usage of transcription start and poly(A) sites, results in varied transcripts being derived from the same gene. Distinct isoforms can play important biological roles, including by changing the sequences or expression levels of protein products. The first single-cell approaches to RNA sequencing-and later, spatial approaches-which are now widely used for the identification of differentially expressed genes, rely on short reads and offer the ability to transcriptomically compare different cell types but are limited in their ability to measure differential isoform expression.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!