A method is described by which bioinformatic concepts and tools can be applied to the study of protein dynamic properties. Sequences are transformed into numerical strings by representing each amino acid by a residue specific average value of the crystallographic alpha carbon B factor. These dynamic sequences are then Fourier transformed.
View Article and Find Full Text PDFWe apply methods of Artificial Intelligence and Machine Learning to protein dynamic bioinformatics. We rewrite the sequences of a large protein data set, containing both folded and intrinsically disordered molecules, using a representation developed previously, which encodes the intrinsic dynamic properties of the naturally occurring amino acids. We Fourier analyze the resulting sequences.
View Article and Find Full Text PDFUsing tools developed to study the dynamic bioinformatics of proteins, we are able to study the dynamic characteristics of very large numbers of protein sequences simultaneously. We study herein the distribution of protein sequences in a space determined by sequence mobility. It is shown that there are statistically significant differences in mobility distribution between folded sequences of different structural classes and between those and sequences of intrinsically disordered proteins.
View Article and Find Full Text PDFUsing recently developed methods for studying the bioinformatics of protein dynamics, we investigate differences in dynamic characteristics between the sequences of proteins that fall into different structural classes. It is shown that there is a clear differentiation of dynamic properties of sequences as a function of structural class. Taken together with previous results we have developed, the present work demonstrates that dynamic properties are associated with structural behavior in two ways.
View Article and Find Full Text PDFWe compare the sequences of folded and intrinsically disordered proteins (IDPs), using bioinformatic methods recently developed to study protein dynamic properties. We demonstrate that the two classes of sequences are organized in diametrically opposite ways with respect to long-length-scale dynamic properties. We further demonstrate a statistically significant difference between the amino acid compositions of folded and disordered proteins, which is expressed in dynamic properties.
View Article and Find Full Text PDFUsing bioinformatic methods for treating protein dynamics, developed in earlier work, we study the relationship between sequence mobility and dynamics in proteins. It is shown that sequence mobility drives a transition between two dynamic regimes in proteins, and that the specific details of this transition differ qualitatively between α-helical proteins and those in other structural classes. We examine the possibility that conformational switching is related to dynamic switching, by considering a specific system of sequences which exhibit the switching phenomenon.
View Article and Find Full Text PDFWe use a bioinformatic description of amino acid dynamic properties, based on residue-specific average B factors, to construct a dynamics-based, large-scale description of a space of protein sequences. We examine the relationship between that space and an independently constructed, structure-based space comprising the same sequences. It is demonstrated that structure and dynamics are only moderately correlated.
View Article and Find Full Text PDFWe examine the local and global properties of the average B-factor, 〈B〉, as a residue-specific indicator of protein dynamic characteristics. It has been shown that values of 〈B〉 for the 20 amino acids differ in a statistically significant manner, and that, while strongly determined by the static physical properties of amino acids, they also encode averaged information about the influence of global fold on single-residue dynamics. Therefore, complete sequences of amino acids also encode fold-related global dynamic information, in addition to the local information that arises from static physical properties.
View Article and Find Full Text PDFTraditional approaches to sequence alignment are based on evolutionary ideas. As a result, they are prebiased toward results which are in accord with initial expectations. We present here a method of sequence alignment which is based entirely on the physical properties of the amino acids.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
February 2017
We recently introduced a physically based approach to sequence comparison, the property factor method (PFM). In the present work, we apply the PFM approach to the study of a challenging set of sequences-the bacterial chemotaxis protein CheY, the N-terminal receiver domain of the nitrogen regulation protein NT-NtrC, and the sporulation response regulator Spo0F. These are all response regulators involved in signal transduction.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
February 2016
The degree of informatic independence between the physical properties of amino acids as encoded in actual protein sequences is calculated. It is shown that no physical property can be identified that carries significantly less information than others and that the information overlap between different properties and different length scales along the sequence is essentially zero. These observations suggest that bioinformatic models based on arbitrarily selected sets of physical properties are inherently deficient.
View Article and Find Full Text PDFWe examine the utility of informatic-based methods in computational protein biophysics. To do so, we use newly developed metric functions to define completely independent sequence and structure spaces for a large database of proteins. By investigating the relationship between these spaces, we demonstrate quantitatively the limits of knowledge-based correlation between the sequences and structures of proteins.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
April 2015
The relationship between protein sequence and structure arises entirely from amino acid physical properties. An alternative method is therefore proposed to identify homologs in which residue equivalence is based exclusively on the pairwise physical property similarities of sequences. This approach, the property factor method (PFM), is entirely different from those in current use.
View Article and Find Full Text PDFWe show that a Fourier-based sequence distance function is able to identify structural homologs of target sequences with high accuracy. It is shown that Fourier distances correlate very strongly with independently determined structural distances between molecules, a property of the method that is not attainable using conventional representations. It is further shown that the ability of the Fourier approach to identify protein folds is statistically far in excess of random expectation.
View Article and Find Full Text PDFThe UNited RESidue (UNRES) coarse-grained model of polypeptide chains, developed in our laboratory, enables us to carry out millisecond-scale molecular-dynamics simulations of large proteins effectively. It performs well in predictions of protein structure, as demonstrated in the last Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP10). However, the resolution of the simulated structure is too coarse, especially in loop regions, which results from insufficient specificity of the model of local interactions.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
September 2013
The performance of the physics-based protocol, whose main component is the United Residue (UNRES) physics-based coarse-grained force field, developed in our laboratory for the prediction of protein structure from amino acid sequence, is illustrated. Candidate models are selected, based on probabilities of the conformational families determined by multiplexed replica-exchange simulations, from the 10th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP10). For target T0663, classified as a new fold, which consists of two domains homologous to those of known proteins, UNRES predicted the correct symmetry of packing, in which the domains are rotated with respect to each other by 180° in the experimental structure.
View Article and Find Full Text PDFDelineation of the relationship between sequence and structure in proteins has proven elusive. Most studies of this problem use alignment methods and other approaches based on the characteristics of individual residues. It is demonstrated herein that the sequence-structure relationship is determined in significant part by global characteristics of sequence organization.
View Article and Find Full Text PDFMethods Mol Biol
January 2013
Analysis of the global properties of protein sequences, rather than single-site or local properties, has been shown to lead to new understanding of folding and function. Here we describe the use of software which can describe sequences numerically in an orthonormal fashion, Fourier-analyze those sequences, and verify the statistical significance of the resulting Fourier coefficients. The resulting parameters can be used to study problems involving sequences from a unique perspective.
View Article and Find Full Text PDFThe existence of conformational switching in proteins, induced by single amino acid mutations, presents an important challenge to our understanding of the physics of protein folding. Sequence-local methods, commonly used to detect structural homology, are incapable of accounting for this phenomenon. We examine a set of proteins, derived from the G(A) and G(B) domains of Streptococcus protein G, which are known to show a dramatic conformational change as a result of single-residue replacement.
View Article and Find Full Text PDFThe effectiveness of sequence alignment in detecting structural homology among protein sequences decreases markedly when pairwise sequence identity is low (the so-called "twilight zone" problem of sequence alignment). Alternative sequence comparison strategies able to detect structural kinship among highly divergent sequences are necessary to address this need. Among them are alignment-free methods, which use global sequence properties (such as amino acid composition) to identify structural homology in a rapid and straightforward way.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
May 2010
Computational studies of the relationships between protein sequence, structure, and folding have traditionally relied on purely local sequence representations. Here we show that global representations, on the basis of parameters that encode information about complete sequences, contain otherwise inaccessible information about the organization of sequences. By studying the spectral properties of these parameters, we demonstrate that amino acid physical properties fall into two distinct classes.
View Article and Find Full Text PDFUsing information-theoretic concepts, we examine the role of the reference state, a crucial component of empirical potential functions, in protein fold recognition. We derive an information-based connection between the probability distribution functions of the reference state and those that characterize the decoy set used in threading. In examining commonly used contact reference states, we find that the quasi-chemical approximation is informatically superior to other variant models designed to include characteristics of real protein chains, such as finite length and variable amino acid composition from protein to protein.
View Article and Find Full Text PDF