Tricross : using dot-plots in sequence-id space to detect uncataloged intergenic features.

Bioinformatics

Children's Research Institute, The Ohio State University, 700 Childrens Dr., Columbus, OH 43205, USA.

Published: December 2001

Motivation: The process of determining the functional sequence content of an organism is confounded by several factors. Large protein coding sequences are relatively easy to find by statistical methods. Smaller proteins however may escape detection due to their size falling below some arbitrary researcher-defined minimum cutoff, or the inability to precisely define a promoter, or translational start (Delcher et al., Nucleic Acids Res., 27, 4636-4641, 1999). Promoter and regulatory sequences themselves are difficult to define due to a significant amount of allowable sequence variation, as well as a probable lack of any completely accurate whole-organismal gene catalogs to date. Finally, certain genes coding functional RNAs may have insufficient structural or sequence constraints to be detectable by normal sequence structure/pattern searching methods (Eddy and Rivas, Bioinformatics, 16, 583-605, 2000). In those cases where there are multiple closely related organisms that have been sequenced, there is additional information that may be used in the investigation of sequence content-that being the possible conserved nature of functional sequences between the organisms. We present a method for the utilization of this conserved information to detect genes and other potentially functional sequences that may be missed by standard ORF-calling, RNA finding, and pattern matching software. The tricross programs produce a multi-way cross comparison of three sets of sequences, determine which are conserved in all three sets, and produce a graphical (Virtual Reality Modelling Language-VRML; (ISO/IEC 14772-1: 1997, VDC), 1997) representation as well as alignments of all sequence triples found. The software can also be applied to a pair of sequence sets, though the noise in the results increases.

Results: Tricross has been used to examine the intergenic-sequence content of the three archaeal Pyrococcus genomes to determine the most highly related sequences remaining between the annotated protein and RNA coding sequences. Set to relatively stringent similarity requirements for the search, tricross found 101 intergenic sequences conserved among the three organisms. Interestingly, 29 of these appear to contain members of a family of small RNA molecules (Kiss-Laszlo et al., EMBO J., 17, 797-807, 1998) only recently discovered in the Archaea (Armbruster, OSU, Diss., 1988; Omer et al., Science, 288, 517-522, 2000; Gaspin et al., J. Mol. Biol., 297, 895-906, 2000). While some of the remaining 72 appear to be individual highly conserved promoter sequences, others have no currently known biological significance. Although originally developed to facilitate the examination of intergenic sequences, none of the tricross logic is inherently specific to intergenic sequences. The software can also be applied to gene sequences, and has been used to produce inter-genomic gene order dot-plots for Haemophilus influenzae (Fleischmann et al., Science, 269, 496-512, 1995) versus H.ducreyi (unpublished data), and Neisseria meningiditis Z2491 (serogroup A) (Parkhill et al., Nature, 404, 502-506, 2000) versus Neisseria meningiditis Z58 (serogroup B) (Tettelin et al., Science, 287, 1809-1815, 2000) versus Neisseria gonorrhoeae (Lewis et al., http://micro-gen.ouhsc.edu/, 2000).

Availability: The tricross software package is available from http://www.biosci.ohio-state.edu/~ray/bioinformatics/tricross.html.

Contact: ray@biosci.ohio-state.edu; daniels.7@osu.edu; munsonr@pediatrics.ohio-state.edu

Supplementary Information: Additional data from the cross-genomic comparisons examined in the discussion section are linked from http://www.biosci.ohio-state.edu/~ray/bioinformatics/tricross.html.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/17.12.1105DOI Listing

Publication Analysis

Top Keywords

sequences
12
intergenic sequences
12
coding sequences
8
functional sequences
8
three sets
8
conserved three
8
software applied
8
neisseria meningiditis
8
2000 versus
8
versus neisseria
8

Similar Publications

Multilayer thin films composed of dielectric BaCaZrTiO (BCZT) and oxygen-deficient BCZT (BCZT-OD) were fabricated on (001)-oriented NSTO substrates using the pulsed laser deposition (PLD) technique. Unlike conventional approaches to energy storage capacitors, which primarily focus on compositional or structural modifications, this study explored the influence of the layer sequence and periodicity. The interface between the NSTO substrate and the BCZT-OD layer forms a Schottky barrier, resulting in electric field redistribution across the sublayers of the BCZT/BCZT-OD//(1P) thin film.

View Article and Find Full Text PDF

Speciation studies in the genomic era.

Yi Chuan

January 2025

State Key Laboratory of Herbage Improvement and Grassland Agro-Ecosystems, College of Ecology, Lanzhou University, Lanzhou 730000, China.

Since Darwin's era, speciation has been one of the most central issues in evolutionary biology studies. Understanding the processes of species origin is crucial in deepening our understanding of the formation of species biodiversity, which is essential for their protections. However, speciation research has been challenging due to the rather complex evolutionary histories of many extant species.

View Article and Find Full Text PDF

Progress on ancient DNA investigation of Late Quaternary mammals in China.

Yi Chuan

January 2025

State Key Laboratory of Biogeology and Environmental Geology, China University of Geosciences, Wuhan 430078, China.

It has been more than 40 years since the beginning of exploring the genetic composition of ancient organisms from the perspective of ancient DNA. In the recent 20 years, with the development and application of high-throughput sequencing technology platforms and the improved efficiency of retrieving highly fragmented DNA molecules, ancient DNA research moved forward to a brand-new era of deep-time paleogenomics. It not only solved many controversial phylogenetic problems, enriched the migration and evolution details of various organisms including humans, but also launched exploration of the molecular responses to climate changes in terms of "whole genomic-big data-multi-species" level.

View Article and Find Full Text PDF

The northern part of Asia, including Siberia, the Mongolian Plateau, and northern China, is not only a crossroads for population exchange on the Eurasian continent but also an important bridge connecting the American continent. This region holds a unique and irreplaceable significance in exploring the origins of humanity, tracking human migration routes, and elucidating evolutionary mechanisms. Despite the limited number of samples unearthed, varying preservation conditions, and constraints of technical means, our understanding of the interactions among populations in northern Asia is still in its infancy.

View Article and Find Full Text PDF

In the last decade the important role of small non-coding RNAs such as micro RNAs (miRs) in gene regulation in healthy and disease states became more and more evident. The miR-200-family of miRs has been shown to play a critical role in many diseases such as cancer and neurodegenerative disorders and could be potentially important for diagnosis and treatment. However, the size of miRs of about ~21-23nt provide challenges for their investigation.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!