Primary sequences of proteins from complete genomes display a singular periodicity: Alignment-free N-gram analysis.

C R Biol

Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University, Pawińskiego 5A, Bldg. D, 02106 Warsaw, Poland.

Published: January 2007

A method is proposed to represent and to analyze complete genome sequences (52 species from procaryotes and eukaryotes), based upon n-gram sequence's frequencies of amino acid pairs (bigrams), separated by a given number of other residues. For each of the species analyzed, it allows us to construct over-abundant and over-deficient occurrence profiles, summarizing amino acid bigram frequencies over the entire genome. The method deals efficiently with a sparseness of statistical representations of individual sequences, and describes every gene sequence in the same way, independently of its length and of the genome sizes. The frequency of over-abundant and over-deficient occurrences of bigrams presents a singular periodicity around 3.5 peptide bonds, suggesting a relation with the alpha helical secondary structure.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.crvi.2006.11.001DOI Listing

Publication Analysis

Top Keywords

singular periodicity
8
amino acid
8
over-abundant over-deficient
8
primary sequences
4
sequences proteins
4
proteins complete
4
complete genomes
4
genomes display
4
display singular
4
periodicity alignment-free
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!