Dinucleotide controlled null models for comparative RNA gene prediction.

BMC Bioinformatics

Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, Dr. Bohr-Gasse 9, A-1030 Vienna, Austria.

Published: May 2008

Background: Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak et al. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available.

Results: We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content.

Conclusion: SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered.

Availability: SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: http://sourceforge.net/projects/sissiz.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2453142PMC
http://dx.doi.org/10.1186/1471-2105-9-248DOI Listing

Publication Analysis

Top Keywords

rna gene
16
multiple alignments
16
dinucleotide content
12
gene prediction
8
alignments average
8
null model
8
gene finding
8
finding program
8
rna
6
alignments
6

Similar Publications

Aggressiveness and phylogenetic relationship of associated with crown and root rot in pyrethrum plants.

Plant Dis

January 2025

The University of Melbourne, Faculty of Science, School of Agriculture, Food and Ecosystem Sciences, Parkville, Victoria, Australia;

In Australia, pyrethrum (Tanacetum cinerariifolium) cultivation provides a significant portion of the global supply of natural insecticidal pyrethrins. However, crown and root rots, along with stunted plant growth and plant loss during winter, are significant issues affecting certain sites. Several isolates of the Fusarium oxysporum species complex (FOSC) have been identified as causal agents of crown and root rot in pyrethrum, highlighting these as key pathogens contributing to this decline.

View Article and Find Full Text PDF

Despite growing awareness of their importance in soil ecology, the genetic and physiological traits of bacterial predators are still relatively poorly understood. In the course of a predator evolution experiment, we identified a class of genotypes leading to enhanced predation against diverse species. RNA-seq analysis demonstrated that this phenotype is linked to the constitutive activation of a predation-specific program.

View Article and Find Full Text PDF

A single-cell atlas of the Culex tarsalis midgut during West Nile virus infection.

PLoS Pathog

January 2025

Department of Microbiology, Immunology and Pathology, College of Veterinary Medicine and Biomedical Sciences, Colorado State University, Fort Collins, Colorado, USA.

The mosquito midgut functions as a key interface between pathogen and vector. However, studies of midgut physiology and virus infection dynamics are scarce, and in Culex tarsalis-an extremely efficient vector of West Nile virus (WNV)-nonexistent. We performed single-cell RNA sequencing on Cx.

View Article and Find Full Text PDF

Upon infection, human papillomavirus (HPV) manipulates host cell gene expression to create an environment that is supportive of a productive and persistent infection. The virus-induced changes to the host cell's transcriptome are thought to contribute to carcinogenesis. Here, we show by RNA-sequencing that oncogenic HPV18 episome replication in primary human foreskin keratinocytes (HFKs) drives host transcriptional changes that are consistent between multiple HFK donors.

View Article and Find Full Text PDF

Prostate cancer is the second most common type of cancer in male worldwide. Stromal-epithelial interaction is thought to have a major impact on cancer development and progression. Previous studies have shown that interaction via soluble factors lead to a reduction in the expression of xCT and AL122023.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!