A multispecies polyadenylation site model.

BMC Bioinformatics

Department of Molecular Genetics, Microbiology and Immunology, University of Medicine and Dentistry of New Jersey-Robert Wood Johnson Medical School, Piscataway, New Jersey, USA.

Published: September 2013

Background: Polyadenylation is present in all three domains of life, making it the most conserved post-transcriptional process compared with splicing and 5'-capping. Even though most mammalian poly(A) sites contain a highly conserved hexanucleotide in the upstream region and a far less conserved U/GU-rich sequence in the downstream region, there are many exceptions. Furthermore, poly(A) sites in other species, such as plants and invertebrates, exhibit high deviation from this genomic structure, making the construction of a general poly(A) site recognition model challenging. We surveyed nine poly(A) site prediction methods published between 1999 and 2011. All methods exploit the skewed nucleotide profile across the poly(A) sites, and the highly conserved poly(A) signal as the primary features for recognition. These methods typically use a large number of features, which increases the dimensionality of the models to crippling degrees, and typically are not validated against many kinds of genomes.

Results: We propose a poly(A) site model that employs minimal features to capture the essence of poly(A) sites, and yet, produces better prediction accuracy across diverse species. Our model consists of three dior-trinucleotide profiles identified through principle component analysis, and the predicted nucleosome occupancy flanking the poly(A) sites. We validated our model using two machine learning methods: logistic regression and linear discriminant analysis. Results show that models achieve 85-92% sensitivity and 85-96% specificity in seven animals and plants. When we applied one model from one species to predict poly(A) sites from other species, the sensitivity scores correlate with phylogenetic distances.

Conclusions: A four-feature model geared towards small motifs was sufficient to accurately learn and predict poly(A) sites across eukaryotes.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3549828PMC
http://dx.doi.org/10.1186/1471-2105-14-S2-S9DOI Listing

Publication Analysis

Top Keywords

polya sites
28
polya site
12
polya
11
site model
8
sites highly
8
highly conserved
8
sites species
8
predict polya
8
model
7
sites
7

Similar Publications

Promoted read-through and mutation against pseudouridine-CMC by an evolved reverse transcriptase.

Commun Biol

January 2025

Department of Chemistry, Merkert Chemistry Center, Boston College, Chestnut Hill, MA, USA.

Pseudouridine (Ψ) is an abundant RNA chemical modification that plays critical biological functions. Current Ψ detection methods are limited in identifying Ψs at base-resolution in U-rich sequence contexts, where Ψ occurs frequently. Here we report "Mut-Ψ-seq" that utilizes the classic N-cyclohexyl N'-(2-morpholinoethyl)carbodiimide (CMC) agent and an evolved reverse transcriptase ("RT-1306") for Ψ mapping at base-resolution.

View Article and Find Full Text PDF

Background: In trypanosomatids, a group of unicellular eukaryotes that includes numerous important human parasites, cis-splicing has been previously reported for only two genes: a poly(A) polymerase and an RNA helicase. Conversely, trans-splicing, which involves the attachment of a spliced leader sequence, is observed for nearly every protein-coding transcript. So far, our understanding of splicing in this protistan group has stemmed from the analysis of only a few medically relevant species.

View Article and Find Full Text PDF

In humans, misprocessed mRNAs containing intact 5' Splice Site (5'SS) motifs are nuclear retained and targeted for decay by ZFC3H1, a component of the Poly(A) Exosome Targeting complex, and U1-70K, a component of the U1 snRNP. In , the ZFC3H1 homolog, Red1, binds to the YTH domain-containing protein Mmi1 and targets certain RNA transcripts to nuclear foci for nuclear retention and decay. Here we show that YTHDC1 and YTHDC2, two YTH domain-containing proteins that bind to -6-methyladenosine (m6A) modified RNAs, interact with ZFC3H1 and U1-70K, and are required for the nuclear retention of mRNAs with intact 5'SS motifs.

View Article and Find Full Text PDF

Background: Long non-coding RNAs (lncRNAs) play important roles in various biological processes, including stage development in plants. N-methyladenosine (mA) modification and polyadenylation are noteworthy regulatory processes that impact transcript functions by modulating their abundance. However, the specific landscapes of mA modification and polyadenylation on lncRNAs remain largely unexplored.

View Article and Find Full Text PDF

RNA isoform diversity, produced via alternative splicing, and alternative usage of transcription start and poly(A) sites, results in varied transcripts being derived from the same gene. Distinct isoforms can play important biological roles, including by changing the sequences or expression levels of protein products. The first single-cell approaches to RNA sequencing-and later, spatial approaches-which are now widely used for the identification of differentially expressed genes, rely on short reads and offer the ability to transcriptomically compare different cell types but are limited in their ability to measure differential isoform expression.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!