The study of correlation structures in DNA sequences is of great interest because it allows us to obtain structural and functional information about underlying genetic mechanisms. In this paper we present a study of the correlation structure of protein coding sequences of DNA based on a recently developed mathematical representation of the genetic code. A fundamental consequence of such representation is that codons can be assigned a parity class (odd-even). Such parity can be obtained by means of a nonlinear algorithm acting on the chemical character of the codon bases. In the same setting the Rumer's class can be naturally described and a new dichotomic class, the hidden class, can be defined. Moreover, we show that the set of DNA's base transformations associated to the three dichotomic classes can be put in a compact group-theoretic framework. We use the dichotomic classes as a coding scheme for DNA sequences and study the mutual dependence between such classes. The same analysis is carried out also on the chemical dichotomies of DNA bases. In both cases, the statistical analysis is performed by using an entropy-based dependence metric possessing many desirable properties. We obtain meaningful tests for mutual dependence by using suitable resampling techniques. We find strong short-range correlations between certain combinations of dichotomic codon classes. These results support our previous hypothesis that codon classes might play an active role in the organization of genetic information.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1103/PhysRevE.78.051918 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!