Common and phylogenetically widespread coding for peptides by bacterial small RNAs.

Robin C Friedman Stefan Kalkhof Olivia Doppelt-Azeroual Stephan A Mueller Martina Chovancová Martin von Bergen Benno Schwikowski

BMC Genomics

Systems Biology Laboratory, Department of Genomes and Genetics, Institut Pasteur, Paris, France.

Published: July 2017

Background: While eukaryotic noncoding RNAs have recently received intense scrutiny, it is becoming clear that bacterial transcription is at least as pervasive. Bacterial small RNAs and antisense RNAs (sRNAs) are often assumed to be noncoding, due to their lack of long open reading frames (ORFs). However, there are numerous examples of sRNAs encoding for small proteins, whether or not they also have a regulatory role at the RNA level.

Methods: Here, we apply flexible machine learning techniques based on sequence features and comparative genomics to quantify the prevalence of sRNA ORFs under natural selection to maintain protein-coding function in 14 phylogenetically diverse bacteria. Importantly, we quantify uncertainty in our predictions, and follow up on them using mass spectrometry proteomics and comparison to datasets including ribosome profiling.

Results: A majority of annotated sRNAs have at least one ORF between 10 and 50 amino acids long, and we conservatively predict that 409±191.7 unannotated sRNA ORFs are under selection to maintain coding (mean estimate and 95% confidence interval), an average of 29 per species considered here. This implies that overall at least 10.3±0.5% of sRNAs have a coding ORF, and in some species around 20% do. 165±69 of these novel coding ORFs have some antisense overlap to annotated ORFs. As experimental validation, many of our predictions are translated in published ribosome profiling data and are identified via mass spectrometry shotgun proteomics. B. subtilis sRNAs with coding ORFs are enriched for high expression in biofilms and confluent growth, and S. pneumoniae sRNAs with coding ORFs are involved in virulence. sRNA coding ORFs are enriched for transmembrane domains and many are predicted novel components of type I toxin/antitoxin systems.

Conclusions: We predict over two dozen new protein-coding genes per bacterial species, but crucially also quantified the uncertainty in this estimate. Our predictions for sRNA coding ORFs, along with predicted novel type I toxins and tools for sorting and visualizing genomic context, are freely available in a user-friendly format at http://disco-bac.web.pasteur.fr. We expect these easily-accessible predictions to be a valuable tool for the study not only of bacterial sRNAs and type I toxin-antitoxin systems, but also of bacterial genetics and genomics.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5521070	PMC
http://dx.doi.org/10.1186/s12864-017-3932-y	DOI Listing

Publication Analysis

Top Keywords

coding orfs

srnas coding

orfs

coding

bacterial small

small rnas

srna orfs

selection maintain

mass spectrometry

orfs enriched

Similar Publications

Ribosome profiling shows variable sensitivity to detect open reading frames for conventional and different types of cryptic T cell antigens.

Mol Ther Methods Clin Dev

March 2025

Department of Hematology, Leiden University Medical Center, Leiden, the Netherlands.

Kyra J Fuchs Sofia Thomaidou Arno R van der Slik Marian van de Meent Peter A C 't Hoen

T cell-based immunotherapies targeting antigens on tumor cells have shown efficacy as anti-cancer treatments. While neoantigens are created by somatic mutations acquired during tumorigenesis, allogeneic stem cell transplantation as treatment for hematological malignancies exploits minor histocompatibility antigens encoded by genetic differences between patients and donors. Screening methods to predict neoantigens and minor histocompatibility antigens typically consider only conventional antigens created by nonsynonymous mutations or polymorphisms coding for amino acid changes in canonical open reading frames (ORFs).

View Article and Find Full Text PDF

Similar Publications

Demystifying the functions of the mitochondrial ORFan proteins in bivalves with doubly uniparental inheritance.

Biol Lett

January 2025

Département de sciences biologiques, Université de Montréal, Montréal, QC, Canada.

Julie Brémaud Alizée Debelli Hajar Hosseini Khorami Donald T Stewart Annie Angers

Strict maternal inheritance of mitochondria is known to be the rule in animals, but over 100 species across six orders of bivalves possess doubly uniparental inheritance (DUI) of mitochondria. Under DUI, two distinctive sex-specific mitogenomes coexist. In marine and freshwater mussels, each mitogenome has an additional protein-coding gene, called female- and male-specific open reading frame or and , respectively.

View Article and Find Full Text PDF

Similar Publications

PSAURON: a tool for assessing protein annotation across a broad range of species.

NAR Genom Bioinform

March 2025

Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA.

Markus J Sommer Aleksey V Zimin Steven L Salzberg

Evaluating the accuracy of protein-coding sequences in genome annotations is a challenging problem for which there is no broadly applicable solution. In this manuscript, we introduce PSAURON (Protein Sequence Assessment Using a Reference ORF Network), a novel software tool developed to help assess the quality of protein-coding gene annotations. Utilizing a machine learning model trained on a diverse dataset from over 1000 plant and animal genomes, PSAURON assigns a score to coding DNA or protein sequence that reflects the likelihood that the sequence is a genuine protein-coding region.

View Article and Find Full Text PDF

Similar Publications

LncSL: A Novel Stacked Ensemble Computing Tool for Subcellular Localization of lncRNA by Amino Acid-Enhanced Features and Two-Stage Automated Selection Strategy.

Int J Mol Sci

December 2024

School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China.

Lun Zhu Hong Chen Sen Yang

Long non-coding RNA (lncRNA) is a non-coding RNA longer than 200 nucleotides, crucial for functions like cell cycle regulation and gene transcription. Accurate localization prediction from sequence information is vital for understanding lncRNA's biological roles. Computational methods offer an effective alternative to traditional experimental methods for annotating lncRNA subcellular positions.

View Article and Find Full Text PDF

Similar Publications

Fixation of Expression Divergences by Natural Selection in Coding Genes.

Int J Mol Sci

December 2024

College of Life Science, Shaanxi Normal University, Xi'an 710119, China.

Cheng Qi Qiang Wei Yuting Ye Jing Liu Guishuang Li

Functional divergences of coding genes can be caused by divergences in their coding sequences and expression. However, whether and how expression divergences and coding sequence divergences coevolve is not clear. Gene expression divergences in differentiated cells and tissues recapitulate developmental models within a species, while gene expression divergences between analogous cells and tissues resemble traditional phylogenies in different species, suggesting that gene expression divergences are molecular traits that can be used for evolutionary studies.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!