Motivation: To test whether protein folding constraints and secondary structure sequence preferences significantly reduce the space of amino acid words in proteins, we compared the frequencies of four- and five-amino acid word clumps (independent words) in proteins to the frequencies predicted by four random sequence models.
Results: While the human proteome has many overrepresented word clumps, these words come from large protein families with biased compositions (e.g.
Pto is a member of a multigene family and encodes a serine/threonine kinase that mediates gene-for-gene resistance to strains of Pseudomonas syringae pv. tomato expressing avrPto. The inferred amino acid sequence of the Pto homologs from both resistant (LpimPth2 to LpimPth4) and susceptible (LescFen, LescPth2 to LescPth5) haplotypes suggested that most could encode functional serine/threonine kinases.
View Article and Find Full Text PDF