The structure of an RNA sequence encodes information about its biological function. Dynamic programming algorithms are often used to predict the conformation of an RNA molecule from its sequence alone, and adding experimental data as auxiliary information improves prediction accuracy. This auxiliary data is typically incorporated into the nearest neighbor thermodynamic model22 by converting the data into pseudoenergies.
View Article and Find Full Text PDFA growing number of RNA sequences are now known to exist in some distribution with two or more different stable structures. Recent algorithms attempt to reconstruct such mixtures using the list of nucleotides in a sequence in conjunction with auxiliary experimental footprinting data. In this paper, we demonstrate some challenges which remain in addressing this problem; in particular we consider the difficulty of reconstructing a mixture of two RNA structures across a spectrum of different relative abundances.
View Article and Find Full Text PDFNucleic Acids Res
December 2014
As the biomedical impact of small RNAs grows, so does the need to understand competing structural alternatives for regions of functional interest. Suboptimal structure analysis provides significantly more RNA base pairing information than a single minimum free energy prediction. Yet computational enhancements like Boltzmann sampling have not been fully adopted by experimentalists since identifying meaningful patterns in this data can be challenging.
View Article and Find Full Text PDFWe analyze the distribution of RNA secondary structures given by the Knudsen-Hein stochastic context-free grammar used in the prediction program Pfold. Our main theorem gives relations between the expected number of these motifs--independent of the grammar probabilities. These relations are a consequence of proving that the distribution of base pairs, of helices, and of different types of loops is asymptotically Gaussian in this model of RNA folding.
View Article and Find Full Text PDFThere are two important problems in the assembly of small, icosahedral RNA viruses. First, how does the capsid protein select the viral RNA for packaging, when there are so many other candidate RNA molecules available? Second, what is the mechanism of assembly? With regard to the first question, there are a number of cases where a particular RNA sequence or structure--often one or more stem-loops--either promotes assembly or is required for assembly, but there are others where specific packaging signals are apparently not required. With regard to the assembly pathway, in those cases where stem-loops are involved, the first step is generally believed to be binding of the capsid proteins to these "fingers" of the RNA secondary structure.
View Article and Find Full Text PDFRecent advances in RNA structure determination include using data from high-throughput probing experiments to improve thermodynamic prediction accuracy. We evaluate the extent and nature of improvements in data-directed predictions for a diverse set of 16S/18S ribosomal sequences using a stochastic model of experimental SHAPE data. The average accuracy for 1000 data-directed predictions always improves over the original minimum free energy (MFE) structure.
View Article and Find Full Text PDFSatellite tobacco mosaic virus (STMV) is an icosahedral T=1 single-stranded RNA virus with a genome containing 1058 nucleotides. X-ray crystallography revealed a structure containing 30 double-helical RNA segments, with each helix having nine base pairs and an unpaired nucleotide at the 3' end of each strand. Based on this structure, Larson and McPherson proposed a model of 30 hairpin-loop elements occupying the edges of the icosahedron and connected by single-stranded regions.
View Article and Find Full Text PDFBackground: Accurate and efficient RNA secondary structure prediction remains an important open problem in computational molecular biology. Historically, advances in computing technology have enabled faster and more accurate RNA secondary structure predictions. Previous parallelized prediction programs achieved significant improvements in runtime, but their implementations were not portable from niche high-performance computers or easily accessible to most RNA researchers.
View Article and Find Full Text PDFMotivated by recent work in parametric sequence alignment, we study the parameter space for scoring RNA folds and construct an RNA polytope. A vertex of this polytope corresponds to RNA secondary structures with common branching. We use this polytope and its normal fan to study the effect of varying three parameters in the free energy model that are not determined experimentally.
View Article and Find Full Text PDFThe identification of small structural motifs and their organization into larger subassemblies is of fundamental interest in the analysis, prediction and design of 3D structures of large RNAs. This problem has been studied only sparsely, as most of the existing work is limited to the characterization and discovery of motifs in RNA secondary structures. We present a novel geometric method for the characterization and identification of structural motifs in 3D rRNA molecules.
View Article and Find Full Text PDFBull Math Biol
January 2009
We give a Large Deviation Principle (LDP) with explicit rate function for the distribution of vertex degrees in plane trees, a combinatorial model of RNA secondary structures. We calculate the typical degree distributions based on nearest neighbor free energies, and compare our results with the branching configurations found in two sets of large RNA secondary structures. We find substantial agreement overall, with some interesting deviations which merit further study.
View Article and Find Full Text PDF