Proteins encoded by small open reading frames (sORFs) can serve as functional elements playing important roles in vivo. Such sORFs also constitute the potential pool for facilitating the de novo gene birth, driving evolutionary innovation and species diversity. Therefore, their theoretical and experimental identification has become a critical issue. Herein, we proposed a protein-coding sORFs prediction method merely based on integrative sequence-derived features. Our prediction performance is better or comparable compared with other nine prevalent methods, which shows that our method can provide a relatively reliable research tool for the prediction of protein-coding sORFs. Our method allows users to estimate the potential expression of a queried sORF, which has been demonstrated by the correlation analysis between our possibility estimation and codon adaption index (CAI). Based on the features that we used, we demonstrated that the sequence features of the protein-coding sORFs in the two domains have significant differences implying that it might be a relatively hard task in terms of cross-domain prediction, hence domain-specific models were developed, which allowed users to predict protein-coding sORFs both in eukaryotes and prokaryotes. Finally, a web-server was developed and provided to boost and facilitate the study of the related field, which is freely available at http://guolab.whu.edu.cn/codingCapacity/index.html.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.ymeth.2022.12.003 | DOI Listing |
Curr Opin Plant Biol
December 2024
CSIR- Central Institute of Medicinal and Aromatic Plants (CSIR-CIMAP) P.O. CIMAP, Near Kukrail Picnic Spot, Lucknow, 226 015, India. Electronic address:
Plant genomes, through their evolutionary journey, have developed a complex composition that includes not only protein-coding sequences but also a significant amount of non-coding DNA, repetitive sequences, and transposable elements, traditionally labeled as "junk DNA". RNA molecules from these regions, labeled as "transcriptional junk," include non-coding RNAs, alternatively spliced transcripts, untranslated regions (UTRs), and short open reading frames (sORFs). However, recent research shows that this genetic material plays crucial roles in gene regulation, affecting plant growth, development, hormonal balance, and responses to stresses.
View Article and Find Full Text PDFGenes Genomics
September 2024
Department of Medicine, College of Medicine, Seoul National University, Seoul, Republic of Korea.
Background: This study is based on deep mining of Ribo-seq data for the identification of lncRNAs that have highly expressed sORFs in HCC. In this paper, dynamic prospects associated with sORFs acting as newly defined tumor-specific epitopes are discussed with possible improvement in strategies for tumor immunotherapy.
Objective: Using ribosome profiling to identify and characterize sORFs within lncRNAs in HCC, identify potential therapeutic targets and tumor-specific epitopes applicable for immunotherapy.
bioRxiv
June 2024
Department of Chemistry, Yale University, New Haven, CT 06511.
Over the past 15 years, hundreds of previously undiscovered bacterial small open reading frame (sORF)-encoded polypeptides (SEPs) of fewer than fifty amino acids have been identified, and biological functions have been ascribed to an increasing number of SEPs from intergenic regions and small RNAs. However, despite numbering in the dozens in , and hundreds to thousands in humans, same-strand nested sORFs that overlap protein coding genes in alternative reading frames remain understudied. In order to provide insight into this enigmatic class of unannotated genes, we characterized GndA, a 36-amino acid, heat shock-regulated SEP encoded within the +2 reading frame of the gene in K-12 MG1655.
View Article and Find Full Text PDFBiochem Biophys Res Commun
December 2023
Centre for Genomics and Personalised Health, Queensland University of Technology, Brisbane, Queensland, 4059, Australia; Cancer Precision Medicine Group, QIMR Berghofer Medical Research Institute, 300 Herston Road, Herston, Queensland, 4006, Australia; Faculty of Health, Queensland University of Technology, Brisbane, Queensland, 4059, Australia; Faculty of Medicine, The University of Queensland, Queensland, 4072, Australia. Electronic address:
In recent years, proteogenomics and ribosome profiling studies have identified a large number of proteins encoded by noncoding regions in the human genome. They are encoded by small open reading frames (sORFs) in the untranslated regions (UTRs) of mRNAs and long non-coding RNAs (lncRNAs). These sORF encoded proteins (SEPs) are often <150AA and show poor evolutionary conservation.
View Article and Find Full Text PDFZhejiang Da Xue Xue Bao Yi Xue Ban
August 2023
College of Life Sciences, Zhejiang University, Hangzhou 310058, China.
Long non-coding RNAs (lncRNAs) which are usually thought to have no protein coding ability, are widely involved in cell proliferation, signal transduction and other biological activities. However, recent studies have suggested that short open reading frames (sORFs) of some lncRNAs can encode small functional peptides (micropeptides). These micropeptides appear to play important roles in calcium homeostasis, embryonic development and tumorigenesis, suggesting their potential as therapeutic targets and diagnostic biomarkers.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!