BMC Bioinformatics
September 2018
Background: Massive genomic data sets from high-throughput sequencing allow for new insights into complex biological systems such as microbial communities. Analyses of their diversity and structure are typically preceded by clustering millions of 16S rRNA gene sequences into OTUs. Swarm introduced a new clustering strategy which addresses important conceptual and performance issues of the popular de novo clustering approach.
View Article and Find Full Text PDFThe structure of RNA has been the subject of intense research over the last decades due to its importance for the correct functioning of RNA molecules in biological processes. Hence, a large number of models for RNA folding and corresponding algorithms for structure prediction have been developed. However, previous models often only consider base pairs, although every base is capable of up to three edge-to-edge interactions with other bases.
View Article and Find Full Text PDFVarious tools used to predict the secondary structure for a given RNA sequence are based on dynamic programming used to compute a conformation of minimum free energy. For structures without pseudoknots, a worst-case runtime proportional to n3, with n being the length of the sequence, results since a table of dimension n2 has to be filled in while a single entry gives rise to a linear computational effort. However, it was recently observed that reformulating the corresponding dynamic programming recursion together with the bookkeeping of potential folding alternatives (a technique called sparsification) may reduce the runtime to n2 on average, assuming that nucleotides of distance d form a hydrogen bond (i.
View Article and Find Full Text PDFIn this paper we present a sampling framework for RNA structures of fixed topological genus. We introduce a novel, linear time, uniform sampling algorithm for RNA structures of fixed topological genus g, for arbitrary g>0. Furthermore we develop a linear time sampling algorithm for RNA structures of fixed topological genus g that are weighted by a simplified, loop-based energy functional.
View Article and Find Full Text PDFPredicting RNA structures with pseudoknots in general is an NP-complete problem. Accordingly, several authors have suggested subclasses that provide polynomial time prediction algorithms by allowing (respectively, disallowing) certain structural motives. In this article, we introduce a unifying algebraic view on most of these classes.
View Article and Find Full Text PDFBackground: Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms.
View Article and Find Full Text PDFTheory Biosci
December 2011
Predicting secondary structures of RNA molecules is one of the fundamental problems of and thus a challenging task in computational structural biology. Over the past decades, mainly two different approaches have been considered to compute predictions of RNA secondary structures from a single sequence: the first one relies on physics-based and the other on probabilistic RNA models. Particularly, the free energy minimization (MFE) approach is usually considered the most popular and successful method.
View Article and Find Full Text PDFJ Bioinform Comput Biol
December 2011
Background: The study of microbial diversity and community structures heavily relies on the analyses of sequence data, predominantly taxonomic marker genes like the small subunit of the ribosomal RNA (SSU rRNA) amplified from environmental samples. Until recently, the "gold standard" for this strategy was the cloning and Sanger sequencing of amplified target genes, usually restricted to a few hundred sequences per sample due to relatively high costs and labor intensity. The recent introduction of massive parallel tag sequencing strategies like pyrosequencing (454 sequencing) has opened a new window into microbial biodiversity research.
View Article and Find Full Text PDFAlgorithms Mol Biol
October 2011
Background: Random biological sequences are a topic of great interest in genome analysis since, according to a powerful paradigm, they represent the background noise from which the actual biological information must differentiate. Accordingly, the generation of random sequences has been investigated for a long time. Similarly, random object of a more complicated structure like RNA molecules or proteins are of interest.
View Article and Find Full Text PDFIn this article, we compute the limit distributions of the numbers of hairpin-loops, interior-loops and bulges in k-noncrossing RNA structures. The latter are coarse-grained RNA structures allowing for cross-serial interactions, subject to the constraint that there are at most k - 1 mutually crossing arcs in the diagram representation of the molecule. We prove central limit theorems by means of studying the corresponding bivariate generating functions.
View Article and Find Full Text PDFMotivation: Several dynamic programming algorithms for predicting RNA structures with pseudoknots have been proposed that differ dramatically from one another in the classes of structures considered.
Results: Here, we use the natural topological classification of RNA structures in terms of irreducible components that are embeddable in the surfaces of fixed genus. We add to the conventional secondary structures four building blocks of genus one in order to construct certain structures of arbitrarily high genus.
IEEE/ACM Trans Comput Biol Bioinform
April 2012
There are two custom ways for predicting RNA secondary structures: minimizing the free energy of a conformation according to a thermodynamic model and maximizing the probability of a folding according to a stochastic model. In most cases, stochastic grammars are used for the latter alternative applying the maximum likelihood principle for determining a grammar's probabilities. In this paper, building on such a stochastic model, we will analyze the expected minimum free energy of an RNA molecule according to Turner's energy rules.
View Article and Find Full Text PDFOver the last few decades, much effort has been taken to develop approaches for identifying good predictions of RNA secondary structure. This is due to the fact that most computational prediction methods based on free energy minimization compute a number of suboptimal foldings and we have to identify the native folding among all these possible secondary structures. Using the abstract shapes approach as introduced by Giegerich et al.
View Article and Find Full Text PDFThe most probable secondary structure of an RNA molecule, given the nucleotide sequence, can be computed efficiently if a stochastic context-free grammar (SCFG) is used as the prior distribution of the secondary structure. The structures of some RNA molecules contain so-called pseudoknots. Allowing all possible configurations of pseudoknots is not compatible with context-free grammar models and makes the search for an optimal secondary structure NP-complete.
View Article and Find Full Text PDFWithin this paper we investigate the Bernoulli model for random secondary structures of ribonucleic acid (RNA) molecules. Assuming that two random bases can form a hydrogen bond with probability p we prove asymptotic equivalents for the averaged number of hairpins and bulges, the averaged loop length, the expected order, the expected number of secondary structures of size n and order k and further parameters all depending on p. In this way we get an insight into the change of shape of a random structure during the process 1 p -->0.
View Article and Find Full Text PDFThe secondary structure of an RNA molecule is of great importance and possesses influence, e.g., on the interaction of tRNA molecules with proteins or on the stabilization of mRNA molecules.
View Article and Find Full Text PDF