AI Article Synopsis

  • Proteins encoded by small open reading frames (sORFs) are essential for various biological functions and may contribute to new gene formation and species diversity.
  • A new prediction method for identifying protein-coding sORFs, based on integrated sequence features, shows better or comparable performance to other existing methods and can help researchers estimate potential sORF expression.
  • The study emphasizes the differences in sequence features between eukaryotic and prokaryotic sORFs, leading to the development of domain-specific models and the creation of a user-friendly web server for easy access to this research tool.

Article Abstract

Proteins encoded by small open reading frames (sORFs) can serve as functional elements playing important roles in vivo. Such sORFs also constitute the potential pool for facilitating the de novo gene birth, driving evolutionary innovation and species diversity. Therefore, their theoretical and experimental identification has become a critical issue. Herein, we proposed a protein-coding sORFs prediction method merely based on integrative sequence-derived features. Our prediction performance is better or comparable compared with other nine prevalent methods, which shows that our method can provide a relatively reliable research tool for the prediction of protein-coding sORFs. Our method allows users to estimate the potential expression of a queried sORF, which has been demonstrated by the correlation analysis between our possibility estimation and codon adaption index (CAI). Based on the features that we used, we demonstrated that the sequence features of the protein-coding sORFs in the two domains have significant differences implying that it might be a relatively hard task in terms of cross-domain prediction, hence domain-specific models were developed, which allowed users to predict protein-coding sORFs both in eukaryotes and prokaryotes. Finally, a web-server was developed and provided to boost and facilitate the study of the related field, which is freely available at http://guolab.whu.edu.cn/codingCapacity/index.html.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.ymeth.2022.12.003DOI Listing

Publication Analysis

Top Keywords

protein-coding sorfs
16
prediction protein-coding
8
sequence-derived features
8
sorfs
6
prediction
5
protein-coding small
4
small orfs
4
orfs multi-species
4
multi-species integrated
4
integrated sequence-derived
4

Similar Publications

Plant genomes, through their evolutionary journey, have developed a complex composition that includes not only protein-coding sequences but also a significant amount of non-coding DNA, repetitive sequences, and transposable elements, traditionally labeled as "junk DNA". RNA molecules from these regions, labeled as "transcriptional junk," include non-coding RNAs, alternatively spliced transcripts, untranslated regions (UTRs), and short open reading frames (sORFs). However, recent research shows that this genetic material plays crucial roles in gene regulation, affecting plant growth, development, hormonal balance, and responses to stresses.

View Article and Find Full Text PDF

Background: This study is based on deep mining of Ribo-seq data for the identification of lncRNAs that have highly expressed sORFs in HCC. In this paper, dynamic prospects associated with sORFs acting as newly defined tumor-specific epitopes are discussed with possible improvement in strategies for tumor immunotherapy.

Objective: Using ribosome profiling to identify and characterize sORFs within lncRNAs in HCC, identify potential therapeutic targets and tumor-specific epitopes applicable for immunotherapy.

View Article and Find Full Text PDF

Over the past 15 years, hundreds of previously undiscovered bacterial small open reading frame (sORF)-encoded polypeptides (SEPs) of fewer than fifty amino acids have been identified, and biological functions have been ascribed to an increasing number of SEPs from intergenic regions and small RNAs. However, despite numbering in the dozens in , and hundreds to thousands in humans, same-strand nested sORFs that overlap protein coding genes in alternative reading frames remain understudied. In order to provide insight into this enigmatic class of unannotated genes, we characterized GndA, a 36-amino acid, heat shock-regulated SEP encoded within the +2 reading frame of the gene in K-12 MG1655.

View Article and Find Full Text PDF

Protein-coding potential of non-canonical open reading frames in human transcriptome.

Biochem Biophys Res Commun

December 2023

Centre for Genomics and Personalised Health, Queensland University of Technology, Brisbane, Queensland, 4059, Australia; Cancer Precision Medicine Group, QIMR Berghofer Medical Research Institute, 300 Herston Road, Herston, Queensland, 4006, Australia; Faculty of Health, Queensland University of Technology, Brisbane, Queensland, 4059, Australia; Faculty of Medicine, The University of Queensland, Queensland, 4072, Australia. Electronic address:

In recent years, proteogenomics and ribosome profiling studies have identified a large number of proteins encoded by noncoding regions in the human genome. They are encoded by small open reading frames (sORFs) in the untranslated regions (UTRs) of mRNAs and long non-coding RNAs (lncRNAs). These sORF encoded proteins (SEPs) are often <150AA and show poor evolutionary conservation.

View Article and Find Full Text PDF

Long non-coding RNAs (lncRNAs) which are usually thought to have no protein coding ability, are widely involved in cell proliferation, signal transduction and other biological activities. However, recent studies have suggested that short open reading frames (sORFs) of some lncRNAs can encode small functional peptides (micropeptides). These micropeptides appear to play important roles in calcium homeostasis, embryonic development and tumorigenesis, suggesting their potential as therapeutic targets and diagnostic biomarkers.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!