A multispecies polyadenylation site model.

Eric S Ho Samuel I Gunderson Siobain Duffy

BMC Bioinformatics

Department of Molecular Genetics, Microbiology and Immunology, University of Medicine and Dentistry of New Jersey-Robert Wood Johnson Medical School, Piscataway, New Jersey, USA.

Published: September 2013

Background: Polyadenylation is present in all three domains of life, making it the most conserved post-transcriptional process compared with splicing and 5'-capping. Even though most mammalian poly(A) sites contain a highly conserved hexanucleotide in the upstream region and a far less conserved U/GU-rich sequence in the downstream region, there are many exceptions. Furthermore, poly(A) sites in other species, such as plants and invertebrates, exhibit high deviation from this genomic structure, making the construction of a general poly(A) site recognition model challenging. We surveyed nine poly(A) site prediction methods published between 1999 and 2011. All methods exploit the skewed nucleotide profile across the poly(A) sites, and the highly conserved poly(A) signal as the primary features for recognition. These methods typically use a large number of features, which increases the dimensionality of the models to crippling degrees, and typically are not validated against many kinds of genomes.

Results: We propose a poly(A) site model that employs minimal features to capture the essence of poly(A) sites, and yet, produces better prediction accuracy across diverse species. Our model consists of three dior-trinucleotide profiles identified through principle component analysis, and the predicted nucleosome occupancy flanking the poly(A) sites. We validated our model using two machine learning methods: logistic regression and linear discriminant analysis. Results show that models achieve 85-92% sensitivity and 85-96% specificity in seven animals and plants. When we applied one model from one species to predict poly(A) sites from other species, the sensitivity scores correlate with phylogenetic distances.

Conclusions: A four-feature model geared towards small motifs was sufficient to accurately learn and predict poly(A) sites across eukaryotes.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3549828	PMC
http://dx.doi.org/10.1186/1471-2105-14-S2-S9	DOI Listing

Publication Analysis

Top Keywords

polya sites

polya site

polya

site model

sites highly

highly conserved

sites species

predict polya

model

sites

Similar Publications

Promoted read-through and mutation against pseudouridine-CMC by an evolved reverse transcriptase.

Commun Biol

January 2025

Department of Chemistry, Merkert Chemistry Center, Boston College, Chestnut Hill, MA, USA.

Zhiyong He Weiqi Qiu Huiqing Zhou

Pseudouridine (Ψ) is an abundant RNA chemical modification that plays critical biological functions. Current Ψ detection methods are limited in identifying Ψs at base-resolution in U-rich sequence contexts, where Ψ occurs frequently. Here we report "Mut-Ψ-seq" that utilizes the classic N-cyclohexyl N'-(2-morpholinoethyl)carbodiimide (CMC) agent and an evolved reverse transcriptase ("RT-1306") for Ψ mapping at base-resolution.

View Article and Find Full Text PDF

Similar Publications

Comprehensive analysis of the Kinetoplastea intron landscape reveals a novel intron-containing gene and the first exclusively trans-splicing eukaryote.

BMC Biol

December 2024

Life Science Research Centre, Faculty of Science, University of Ostrava, Ostrava, 710 00, Czech Republic.

Alexei Yu Kostygov Karolína Skýpalová Natalia Kraeva Elora Kalita Cameron McLeod

Background: In trypanosomatids, a group of unicellular eukaryotes that includes numerous important human parasites, cis-splicing has been previously reported for only two genes: a poly(A) polymerase and an RNA helicase. Conversely, trans-splicing, which involves the attachment of a spliced leader sequence, is observed for nearly every protein-coding transcript. So far, our understanding of splicing in this protistan group has stemmed from the analysis of only a few medically relevant species.

View Article and Find Full Text PDF

Similar Publications

-6-methyladenosine (m6A) promotes the nuclear retention of mRNAs with intact 5' splice site motifs.

Life Sci Alliance

February 2025

Department of Biochemistry, University of Toronto, Toronto, Canada

Eliza S Lee Harrison W Smith Yifan E Wang Sean Sj Ihn Leticia Scalize de Oliveira

In humans, misprocessed mRNAs containing intact 5' Splice Site (5'SS) motifs are nuclear retained and targeted for decay by ZFC3H1, a component of the Poly(A) Exosome Targeting complex, and U1-70K, a component of the U1 snRNP. In , the ZFC3H1 homolog, Red1, binds to the YTH domain-containing protein Mmi1 and targets certain RNA transcripts to nuclear foci for nuclear retention and decay. Here we show that YTHDC1 and YTHDC2, two YTH domain-containing proteins that bind to -6-methyladenosine (m6A) modified RNAs, interact with ZFC3H1 and U1-70K, and are required for the nuclear retention of mRNAs with intact 5'SS motifs.

View Article and Find Full Text PDF

Similar Publications

Nanopore direct RNA sequencing reveals N-methyladenosine and polyadenylation landscapes on long non-coding RNAs in Arabidopsis thaliana.

BMC Plant Biol

November 2024

School of Life Sciences and State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.

Qiaoxia Liang Jizhou Zhang Hon-Ming Lam Ting-Fung Chan

Background: Long non-coding RNAs (lncRNAs) play important roles in various biological processes, including stage development in plants. N-methyladenosine (mA) modification and polyadenylation are noteworthy regulatory processes that impact transcript functions by modulating their abundance. However, the specific landscapes of mA modification and polyadenylation on lncRNAs remain largely unexplored.

View Article and Find Full Text PDF

Similar Publications

Understanding isoform expression by pairing long-read sequencing with single-cell and spatial transcriptomics.

Genome Res

November 2024

Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, New York 10065, USA;

Natan Belchikov Justine Hsu Xiang Jennie Li Julien Jarroux Wen Hu

RNA isoform diversity, produced via alternative splicing, and alternative usage of transcription start and poly(A) sites, results in varied transcripts being derived from the same gene. Distinct isoforms can play important biological roles, including by changing the sequences or expression levels of protein products. The first single-cell approaches to RNA sequencing-and later, spatial approaches-which are now widely used for the identification of differentially expressed genes, rely on short reads and offer the ability to transcriptomically compare different cell types but are limited in their ability to measure differential isoform expression.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!