Protein structure, function, and evolution depend on local and collective epistatic interactions between amino acids. A powerful approach to defining these interactions is to construct models of couplings between amino acids that reproduce the empirical statistics (frequencies and correlations) observed in sequences comprising a protein family. The top couplings are then interpreted.
View Article and Find Full Text PDFThe ribosome translates the genetic code into proteins in all domains of life. Its size and complexity demand long-range interactions that regulate ribosome function. These interactions are largely unknown.
View Article and Find Full Text PDFThe rational design of enzymes is an important goal for both fundamental and practical reasons. Here, we describe a process to learn the constraints for specifying proteins purely from evolutionary sequence data, design and build libraries of synthetic genes, and test them for activity in vivo using a quantitative complementation assay. For chorismate mutase, a key enzyme in the biosynthesis of aromatic amino acids, we demonstrate the design of natural-like catalytic function with substantial sequence diversity.
View Article and Find Full Text PDFThe design and synthesis of novel genes and deoxyribonucleic acid (DNA) sequences is a central technique in synthetic biology. Current methods of high throughput gene synthesis use pooled oligonucleotides obtained from custom-designed DNA microarray chips, and rely on orthogonal (non-interacting) polymerase chain reaction primers to specifically de-multiplex, by amplification, the precise subset of oligonucleotides necessary to assemble a full length gene. The availability of a large validated set of mutually orthogonal primers is therefore a crucial reagent for high-throughput gene synthesis.
View Article and Find Full Text PDFThe sequence of events that initiates T cell signaling is dictated by the specificities and order of activation of the tyrosine kinases that signal downstream of the T cell receptor. Using a platform that combines exhaustive point-mutagenesis of peptide substrates, bacterial surface-display, cell sorting, and deep sequencing, we have defined the specificities of the first two kinases in this pathway, Lck and ZAP-70, for the T cell receptor ζ chain and the scaffold proteins LAT and SLP-76. We find that ZAP-70 selects its substrates by utilizing an electrostatic mechanism that excludes substrates with positively-charged residues and favors LAT and SLP-76 phosphosites that are surrounded by negatively-charged residues.
View Article and Find Full Text PDFStatistical analysis of protein sequences indicates an architecture for natural proteins in which amino acids are engaged in a sparse, hierarchical pattern of interactions in the tertiary structure. This architecture might be a key and distinguishing feature of evolved proteins-a design principle providing not only for foldability and high-performance function but also for robustness to perturbation and the capacity for rapid adaptation to new selection pressures. Here, we describe an approach for systematically testing this design principle for natural-like proteins by (1) computational design of synthetic sequences that gradually add or remove constraints along the hierarchy of interacting residues and (2) experimental testing of the designed sequences for folding and biochemical function.
View Article and Find Full Text PDFAllosteric coupling between protein domains is fundamental to many cellular processes. For example, Hsp70 molecular chaperones use ATP binding by their actin-like N-terminal ATPase domain to control substrate interactions in their C-terminal substrate-binding domain, a reaction that is critical for protein folding in cells. Here, we generalize the statistical coupling analysis to simultaneously evaluate co-evolution between protein residues and functional divergence between sequences in protein sub-families.
View Article and Find Full Text PDFStatistical analyses of protein families reveal networks of coevolving amino acids that functionally link distantly positioned functional surfaces. Such linkages suggest a concept for engineering allosteric control into proteins: The intramolecular networks of two proteins could be joined across their surface sites such that the activity of one protein might control the activity of the other. We tested this idea by creating PAS-DHFR, a designed chimeric protein that connects a light-sensing signaling domain from a plant member of the Per/Arnt/Sim (PAS) family of proteins with Escherichia coli dihydrofolate reductase (DHFR).
View Article and Find Full Text PDFProtein sequences evolve through random mutagenesis with selection for optimal fitness. Cooperative folding into a stable tertiary structure is one aspect of fitness, but evolutionary selection ultimately operates on function, not on structure. In the accompanying paper, we proposed a model for the evolutionary constraint on a small protein interaction module (the WW domain) through application of the SCA, a statistical analysis of multiple sequence alignments.
View Article and Find Full Text PDFClassical studies show that for many proteins, the information required for specifying the tertiary structure is contained in the amino acid sequence. Here, we attempt to define the sequence rules for specifying a protein fold by computationally creating artificial protein sequences using only statistical information encoded in a multiple sequence alignment and no tertiary structure information. Experimental testing of libraries of artificial WW domain sequences shows that a simple statistical energy function capturing coevolution between amino acid residues is necessary and sufficient to specify sequences that fold into native structures.
View Article and Find Full Text PDFErythropoietin receptor (EpoR) activation is crucial for mature red blood cell production. The murine EpoR can also be activated by the envelope protein of the polycythemic (P) spleen focus forming virus (SFFV), gp55-P. Due to differences in the TM sequence, gp55 of the anemic (A) strain SFFV, gp55-A, cannot efficiently activate the EpoR.
View Article and Find Full Text PDFPredicting protein sequences that fold into specific native three-dimensional structures is a problem of great potential complexity. Although the complete solution is ultimately rooted in understanding the physical chemistry underlying the complex interactions between amino acid residues that determine protein stability, recent work shows that empirical information about these first principles is embedded in the statistics of protein sequence and structure databases. This review focuses on the use of 'knowledge-based' potentials derived from these databases in designing proteins.
View Article and Find Full Text PDF