Genomic data analysis using DNA structure: an analysis of conserved nongenic sequences and ultraconserved elements.

J Chem Inf Model

Centre for Chemical Biology, Krebs Institute for Biomolecular Science, Department of Chemistry, University of Sheffield, Sheffield S3 7HF, United Kingdom.

Published: September 2006

AI Article Synopsis

  • Recent studies have identified highly conserved nongenic sequences (CNGs) and ultraconserved elements (UCEs) in both human and mouse genomes, which are notable for their lack of known function and extreme conservation.
  • An alignment-free approach is necessary for analyzing these sequences due to the absence of detectable homology between similar CNGs and UCEs across the two species.
  • The research utilizes Fourier techniques to examine the structural properties of these sequences based on a database of 32,896 unique DNA octamers, aiming to uncover potential functions through structural correlations.

Article Abstract

Recent comparative studies of the human and mouse genomes have revealed sets of conserved nongenic sequences (CNGs) and sets of ultraconserved elements (UCEs). Both sets of sequences, which exhibit extremely high levels of conservation, extend over hundreds of bases and have no known function. Since there is no detectable sequence homology between paralogous CNGs or UCEs in either of the species, an alignment-free technique is needed for their analysis. We have previously compiled a database of the structural properties of all 32,896 unique DNA octamers, including information on stability, the minimum energy conformation, and flexibility. We have used Fourier techniques to analyze the UCEs and CNGs in terms of their octamer structural properties, to reveal structural correlations which may indicate possible functions for some of these sequences.

Download full-text PDF

Source
http://dx.doi.org/10.1021/ci050384iDOI Listing

Publication Analysis

Top Keywords

conserved nongenic
8
nongenic sequences
8
ultraconserved elements
8
structural properties
8
genomic data
4
data analysis
4
analysis dna
4
dna structure
4
structure analysis
4
analysis conserved
4

Similar Publications

Sequence, Structure, and Functional Space of Drosophila De Novo Proteins.

Genome Biol Evol

August 2024

Institute for Evolution and Biodiversity, University of Muenster, Huefferstrasse 1, 48149 Muenster, Germany.

During de novo emergence, new protein coding genes emerge from previously nongenic sequences. The de novo proteins they encode are dissimilar in composition and predicted biochemical properties to conserved proteins. However, functional de novo proteins indeed exist.

View Article and Find Full Text PDF

Small open reading frames (sORFs; <300 nucleotides or <100 amino acids) are widespread across all genomes, and an increasing variety of them appear to be translating from non-genic regions. Over the past few decades, peptides produced from sORFs have been identified as functional in various organisms, from bacteria to humans. Despite recent advances in next-generation sequencing and proteomics, accurate annotation and classification of sORFs remain a rate-limiting step toward reliable and high-throughput detection of small proteins from non-genic regions.

View Article and Find Full Text PDF

Sesame (Sesamum indicum L.) is an ancient oilseed crop belonging to the family Pedaliaceae and a globally cultivated crop for its use as oil and food. In this study, 2496 sesame accessions, being conserved at the National Genebank of ICAR-National Bureau of Plant Genetic Resources (NBPGR), were genotyped using genomics-assisted double-digest restriction-associated DNA sequencing (ddRAD-seq) approach.

View Article and Find Full Text PDF

Genomes encode for genes and non-coding DNA, both capable of transcriptional activity. However, unlike canonical genes, many transcripts from non-coding DNA have limited evidence of conservation or function. Here, to determine how much biological noise is expected from non-genic sequences, we quantify the regulatory activity of evolutionarily naive DNA using RNA-seq in yeast and computational predictions in humans.

View Article and Find Full Text PDF

Amaranth Genomic Resource Database: an integrated database resource of Amaranth genes and genomics.

Front Plant Sci

June 2023

Division of Genomic Resources, ICAR-National Bureau of Plant Genetic Resources, New Delhi, India.

Amaranth ( L.) is native to Mexico and North America, where it was cultivated thousands of years ago, but now amaranth is grown worldwide. Amaranth is one of the most promising food crops with high nutritional value and belongs to the family Amaranthaceae.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!