Motivation: Next-generation sequencing has become an important tool in molecular biology. Various protocols to investigate genomic, transcriptomic and epigenomic features across virtually all species and tissues have been devised. For most of these experiments, one of the first crucial steps of bioinformatic analysis is the mapping of reads to reference genomes.
View Article and Find Full Text PDFBackground: The genome is pervasively transcribed but most transcripts do not code for proteins, constituting non-protein-coding RNAs. Despite increasing numbers of functional reports of individual long non-coding RNAs (lncRNAs), assessing the extent of functionality among the non-coding transcriptional output of mammalian cells remains intricate. In the protein-coding world, transcripts differentially expressed in the context of processes essential for the survival of multicellular organisms have been instrumental in the discovery of functionally relevant proteins and their deregulation is frequently associated with diseases.
View Article and Find Full Text PDFMotivation: Computer-assisted studies of structure, function and evolution of viruses remains a neglected area of research. The attention of bioinformaticians to this interesting and challenging field is far from commensurate with its medical and biotechnological importance. It is telling that out of >200 talks held at ISMB 2013, the largest international bioinformatics conference, only one presentation explicitly dealt with viruses.
View Article and Find Full Text PDFNumerous high-throughput sequencing studies have focused on detecting conventionally spliced mRNAs in RNA-seq data. However, non-standard RNAs arising through gene fusion, circularization or trans-splicing are often neglected. We introduce a novel, unbiased algorithm to detect splice junctions from single-end cDNA sequences.
View Article and Find Full Text PDFSugar beet (Beta vulgaris ssp. vulgaris) is an important crop of temperate climates which provides nearly 30% of the world's annual sugar production and is a source for bioethanol and animal feed. The species belongs to the order of Caryophylalles, is diploid with 2n = 18 chromosomes, has an estimated genome size of 714-758 megabases and shares an ancient genome triplication with other eudicot plants.
View Article and Find Full Text PDFIEEE/ACM Trans Comput Biol Bioinform
August 2014
G-quadruplexes are abundant locally stable structural elements in nucleic acids. The combinatorial theory of RNA structures and the dynamic programming algorithms for RNA secondary structure prediction are extended here to incorporate G-quadruplexes using a simple but plausible energy model. With preliminary energy parameters, we find that the overwhelming majority of putative quadruplex-forming sequences in the human genome are likely to fold into canonical secondary structures instead.
View Article and Find Full Text PDFMotivation: Although small nucleolar RNAs form an important class of non-coding RNAs, no comprehensive annotation efforts have been undertaken, presumably because the task is complicated by both the large number of distinct small nucleolar RNA families and their relatively rapid pace of sequence evolution.
Results: With snoStrip we present an automatic annotation pipeline developed specifically for comparative genomics of small nucleolar RNAs. It makes use of sequence conservation, canonical box motifs as well as secondary structure and predicts putative targets.
Circular and apparently trans-spliced RNAs have recently been reported as abundant types of transcripts in mammalian transcriptome data. Both types of non-colinear RNAs are also abundant in RNA-seq of different tissue from both the African and the Indonesian coelacanth. We observe more than 8,000 lincRNAs with normal gene structure and several thousands of circularized and trans-spliced products, showing that such atypical RNAs form a substantial contribution to the transcriptome.
View Article and Find Full Text PDFRibosomal and small nuclear RNAs (snRNAs) comprise numerous modified nucleotides. The modification patterns are retained during evolution, making it even possible to project them from yeast onto human. The stringent conservation of modification sites and the slow evolution of rRNAs and snRNAs contradicts the rapid evolution of small nucleolar RNA (snoRNA) sequences.
View Article and Find Full Text PDFDue to their function as adapters in translation, tRNA molecules share a common structural organization in all kingdoms and organelles with ribosomal protein biosynthesis. A typical tRNA has a cloverleaf-like secondary structure, consisting of acceptor stem, D-arm, anticodon arm, a variable region, and T-arm, with an average length of 73 nucleotides. In several mitochondrial genomes, however, tRNA genes encode transcripts that show a considerable deviation of this standard, having reduced D- or T-arms or even completely lack one of these elements, resulting in tRNAs as small as 66 nts.
View Article and Find Full Text PDFEukaryotic histones carry a diverse set of specific chemical modifications that accumulate over the life-time of a cell and have a crucial impact on the cell state in general and the transcriptional program in particular. Replication constitutes a dramatic disruption of the chromatin states that effectively amounts to partial erasure of stored information. To preserve its epigenetic state the cell reconstructs (at least part of) the histone modifications by means of processes that are still very poorly understood.
View Article and Find Full Text PDFThe chromosome 9p21 (Chr9p21) locus of coronary artery disease has been identified in the first surge of genome-wide association and is the strongest genetic factor of atherosclerosis known today. Chr9p21 encodes the long non-coding RNA (ncRNA) antisense non-coding RNA in the INK4 locus (ANRIL). ANRIL expression is associated with the Chr9p21 genotype and correlated with atherosclerosis severity.
View Article and Find Full Text PDFEvolutionarily conserved RNA secondary structures are a robust indicator of purifying selection and, consequently, molecular function. Evaluating their genome-wide occurrence through comparative genomics has consistently been plagued by high false-positive rates and divergent predictions. We present a novel benchmarking pipeline aimed at calibrating the precision of genome-wide scans for consensus RNA structure prediction.
View Article and Find Full Text PDFRNA has become an integral building material in synthetic biology. Dominated by their secondary structures, which can be computed efficiently, RNA molecules are amenable not only to in vitro and in vivo selection, but also to rational, computation-based design. While the inverse folding problem of constructing an RNA sequence with a prescribed ground-state structure has received considerable attention for nearly two decades, there have been few efforts to design RNAs that can switch between distinct prescribed conformations.
View Article and Find Full Text PDFThe duck (Anas platyrhynchos) is one of the principal natural hosts of influenza A viruses. We present the duck genome sequence and perform deep transcriptome analyses to investigate immune-related genes. Our data indicate that the duck possesses a contractive immune gene repertoire, as in chicken and zebra finch, and this repertoire has been shaped through lineage-specific duplications.
View Article and Find Full Text PDFProkaryotic transcripts constitute almost always uninterrupted intervals when mapped back to the genome. Split reads, i.e.
View Article and Find Full Text PDFAbout 2800 mitochondrial genomes of Metazoa are present in NCBI RefSeq today, two thirds belonging to vertebrates. Metazoan phylogeny was recently challenged by large scale EST approaches (phylogenomics), stabilizing classical nodes while simultaneously supporting new sister group hypotheses. The use of mitochondrial data in deep phylogeny analyses was often criticized because of high substitution rates on nucleotides, large differences in amino acid substitution rate between taxa, and biases in nucleotide frequencies.
View Article and Find Full Text PDFUnlabelled: Correct annotation of protein coding genes is the basis of conventional data analysis in proteomic studies. Nevertheless, most protein sequence databases almost exclusively rely on gene finding software and inevitably also miss protein annotations or possess errors. Proteogenomics tries to overcome these issues by matching MS data directly against a genome sequence database.
View Article and Find Full Text PDFThe function of many non-coding RNA genes and cis-regulatory elements of messenger RNA largely depends on the structure, which is in turn determined by their sequence. Single nucleotide polymorphisms (SNPs) and other mutations may disrupt the RNA structure, interfere with the molecular function and hence cause a phenotypic effect. RNAsnp is an efficient method to predict the effect of SNPs on local RNA secondary structure based on the RNA folding algorithms implemented in the Vienna RNA package.
View Article and Find Full Text PDFBackground: The search for distant homologs has become an import issue in genome annotation. A particular difficulty is posed by divergent homologs that have lost recognizable sequence similarity. This same problem also arises in the recognition of novel members of large classes of RNAs such as snoRNAs or microRNAs that consist of families unrelated by common descent.
View Article and Find Full Text PDFThe discovery of a living coelacanth specimen in 1938 was remarkable, as this lineage of lobe-finned fish was thought to have become extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae.
View Article and Find Full Text PDFTelomerase is a ribonucleoprotein (RNP) enzyme essential for telomere maintenance and chromosome stability. While the catalytic telomerase reverse transcriptase (TERT) protein is well conserved across eukaryotes, telomerase RNA (TR) is extensively divergent in size, sequence, and structure. This diversity prohibits TR identification from many important organisms.
View Article and Find Full Text PDFStructural characteristics are essential for the functioning of many noncoding RNAs and cis-regulatory elements of mRNAs. SNPs may disrupt these structures, interfere with their molecular function, and hence cause a phenotypic effect. RNA folding algorithms can provide detailed insights into structural effects of SNPs.
View Article and Find Full Text PDFRiboswitches are regulatory RNA elements typically located in the 5'-untranslated region of certain mRNAs and control gene expression at the level of transcription or translation. These elements consist of a sensor and an adjacent actuator domain. The sensor usually is an aptamer that specifically interacts with a ligand.
View Article and Find Full Text PDFGene structure data can substantially advance our understanding of metazoan evolution and deliver an independent approach to resolve conflicts among existing hypotheses. Here, we used changes of spliceosomal intron positions as novel phylogenetic marker to reconstruct the animal tree. This kind of data is inferred from orthologous genes containing mutually exclusive introns at pairs of sequence positions in close proximity, so-called near intron pairs (NIPs).
View Article and Find Full Text PDF