The content of guanine+cytosine varies markedly along the chromosomes of homeotherms and great effort has been devoted to studying this heterogeneity and its biological implications. Already before the DNA-sequencing era, however, it was established that the dinucleotides in the DNA of mammals in particular, and of most organisms in general, show striking over- and under-representations that cannot be explained by the base composition. Here we show that in the coding regions of vertebrates both GC content and codon occurrences are strongly correlated with such "motif preferences" even though we quantify the latter using an index that is not affected by the base composition, codon usage, and protein-sequence encoding. These correlations are likely to be the result of the long-term shaping of the primary structure of genic and non-genic DNA by a regime of mutation of which central features have been maintained by natural selection. We find indeed that these preferences are conserved in vertebrates even more rigidly than codon occurrences and we show that the occurrence-preference correlations are stronger in intronic and non-genic DNA, with the R(2)s reaching 99% when GC content is approximately 0.5. The mutation regime appears to be characterized by rates that depend markedly on the bases present at the site preceding and at that following each mutating site, because when we estimate such rates of neighbor-base-dependent mutation (NBDM) from substitutions retrieved from alignments of coding, intronic, and non-genic mammalian DNA sorted and grouped by GC content, they suffice to simulate DNA sequences in which motif occurrences and preferences as well as the correlations of motif preferences with GC content and with motif occurrences, are very similar to the mammalian ones. The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription. Finally, we show that human coding regions are less mutable under the estimated NBDM regimes than under matched context-independent mutation and that this entails marked differences between the spectra of amino-acid mutations that either mutation regime should generate. In the Discussion we examine the mechanisms likely to underlie NBDM heterogeneity along chromosomes and propose that it reflects how the diversity and activity of lesion-bypass polymerases (LBPs) track the landscapes of scheduled and non-scheduled genome repair, replication, and transcription during the cell cycle. We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads. Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2366069 | PMC |
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0002145 | PLOS |
Elife
December 2024
Center for RNA Research, Institute for Basic Science, Seoul, Republic of Korea.
Although HIV-1 integration sites favor active transcription units in the human genome, high-resolution analysis of individual HIV-1 integration sites has shown that the virus can integrate into a variety of host genomic locations, including non-genic regions. The invisible infection by HIV-1 integrating into non-genic regions, challenging the traditional understanding of HIV-1 integration site selection, is more problematic because they are selected for preservation in the host genome during prolonged antiretroviral therapies. Here, we showed that HIV-1 integrates its viral genome into the vicinity of R-loops, a genomic structure composed of DNA-RNA hybrids.
View Article and Find Full Text PDFBiosystems
December 2024
Department of Biomedical and Molecular Sciences, Queen's University, Kingston, K7L3N6, Canada. Electronic address:
The peace of the world is challenged by societal confrontations that can often be labeled "racial" or "ethnic." Emblematic of this is discrimination based on skin colour. William Bateson's background suggests sympathy with the black emancipation movement.
View Article and Find Full Text PDFBMC Genomics
September 2024
Mitsubishi Research Institute, Inc., Tokyo, Japan.
Background: Novel protein-coding genes were considered to be born by re-organization of pre-existing genes, such as gene duplication and gene fusion. However, recent progress of genome research revealed that more protein-coding genes than expected were born de novo, that is, gene origination by accumulating mutations in non-genic DNA sequences. Nonetheless, the in-depth process (scenario) for de novo origination is not well understood.
View Article and Find Full Text PDFGenome Biol Evol
July 2024
Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster 48149, Germany.
For protein coding genes to emerge de novo from a non-genic DNA, the DNA sequence must gain an open reading frame (ORF) and the ability to be transcribed. The newborn de novo gene can further evolve to accumulate changes in its sequence. Consequently, it can also elongate or shrink with time.
View Article and Find Full Text PDFPlant Genome
June 2024
ICAR-National Bureau of Plant Genetic Resources, PUSA Campus, New Delhi, India.
Sesame (Sesamum indicum L.) is an ancient oilseed crop belonging to the family Pedaliaceae and a globally cultivated crop for its use as oil and food. In this study, 2496 sesame accessions, being conserved at the National Genebank of ICAR-National Bureau of Plant Genetic Resources (NBPGR), were genotyped using genomics-assisted double-digest restriction-associated DNA sequencing (ddRAD-seq) approach.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!