Common and phylogenetically widespread coding for peptides by bacterial small RNAs.

BMC Genomics

Systems Biology Laboratory, Department of Genomes and Genetics, Institut Pasteur, Paris, France.

Published: July 2017

Background: While eukaryotic noncoding RNAs have recently received intense scrutiny, it is becoming clear that bacterial transcription is at least as pervasive. Bacterial small RNAs and antisense RNAs (sRNAs) are often assumed to be noncoding, due to their lack of long open reading frames (ORFs). However, there are numerous examples of sRNAs encoding for small proteins, whether or not they also have a regulatory role at the RNA level.

Methods: Here, we apply flexible machine learning techniques based on sequence features and comparative genomics to quantify the prevalence of sRNA ORFs under natural selection to maintain protein-coding function in 14 phylogenetically diverse bacteria. Importantly, we quantify uncertainty in our predictions, and follow up on them using mass spectrometry proteomics and comparison to datasets including ribosome profiling.

Results: A majority of annotated sRNAs have at least one ORF between 10 and 50 amino acids long, and we conservatively predict that 409±191.7 unannotated sRNA ORFs are under selection to maintain coding (mean estimate and 95% confidence interval), an average of 29 per species considered here. This implies that overall at least 10.3±0.5% of sRNAs have a coding ORF, and in some species around 20% do. 165±69 of these novel coding ORFs have some antisense overlap to annotated ORFs. As experimental validation, many of our predictions are translated in published ribosome profiling data and are identified via mass spectrometry shotgun proteomics. B. subtilis sRNAs with coding ORFs are enriched for high expression in biofilms and confluent growth, and S. pneumoniae sRNAs with coding ORFs are involved in virulence. sRNA coding ORFs are enriched for transmembrane domains and many are predicted novel components of type I toxin/antitoxin systems.

Conclusions: We predict over two dozen new protein-coding genes per bacterial species, but crucially also quantified the uncertainty in this estimate. Our predictions for sRNA coding ORFs, along with predicted novel type I toxins and tools for sorting and visualizing genomic context, are freely available in a user-friendly format at http://disco-bac.web.pasteur.fr. We expect these easily-accessible predictions to be a valuable tool for the study not only of bacterial sRNAs and type I toxin-antitoxin systems, but also of bacterial genetics and genomics.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5521070PMC
http://dx.doi.org/10.1186/s12864-017-3932-yDOI Listing

Publication Analysis

Top Keywords

coding orfs
20
srnas coding
12
orfs
9
coding
8
bacterial small
8
small rnas
8
srna orfs
8
selection maintain
8
mass spectrometry
8
orfs enriched
8

Similar Publications

T cell-based immunotherapies targeting antigens on tumor cells have shown efficacy as anti-cancer treatments. While neoantigens are created by somatic mutations acquired during tumorigenesis, allogeneic stem cell transplantation as treatment for hematological malignancies exploits minor histocompatibility antigens encoded by genetic differences between patients and donors. Screening methods to predict neoantigens and minor histocompatibility antigens typically consider only conventional antigens created by nonsynonymous mutations or polymorphisms coding for amino acid changes in canonical open reading frames (ORFs).

View Article and Find Full Text PDF

Strict maternal inheritance of mitochondria is known to be the rule in animals, but over 100 species across six orders of bivalves possess doubly uniparental inheritance (DUI) of mitochondria. Under DUI, two distinctive sex-specific mitogenomes coexist. In marine and freshwater mussels, each mitogenome has an additional protein-coding gene, called female- and male-specific open reading frame or and , respectively.

View Article and Find Full Text PDF

Evaluating the accuracy of protein-coding sequences in genome annotations is a challenging problem for which there is no broadly applicable solution. In this manuscript, we introduce PSAURON (Protein Sequence Assessment Using a Reference ORF Network), a novel software tool developed to help assess the quality of protein-coding gene annotations. Utilizing a machine learning model trained on a diverse dataset from over 1000 plant and animal genomes, PSAURON assigns a score to coding DNA or protein sequence that reflects the likelihood that the sequence is a genuine protein-coding region.

View Article and Find Full Text PDF

LncSL: A Novel Stacked Ensemble Computing Tool for Subcellular Localization of lncRNA by Amino Acid-Enhanced Features and Two-Stage Automated Selection Strategy.

Int J Mol Sci

December 2024

School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China.

Long non-coding RNA (lncRNA) is a non-coding RNA longer than 200 nucleotides, crucial for functions like cell cycle regulation and gene transcription. Accurate localization prediction from sequence information is vital for understanding lncRNA's biological roles. Computational methods offer an effective alternative to traditional experimental methods for annotating lncRNA subcellular positions.

View Article and Find Full Text PDF

Functional divergences of coding genes can be caused by divergences in their coding sequences and expression. However, whether and how expression divergences and coding sequence divergences coevolve is not clear. Gene expression divergences in differentiated cells and tissues recapitulate developmental models within a species, while gene expression divergences between analogous cells and tissues resemble traditional phylogenies in different species, suggesting that gene expression divergences are molecular traits that can be used for evolutionary studies.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!