Identifying causal mutations accelerates genetic disease diagnosis, and therapeutic development. Missense variants present a bottleneck in genetic diagnoses as their effects are less straightforward than truncations or nonsense mutations. While computational prediction methods are increasingly successful at prediction for variants in disease genes, they do not generalize well to other genes as the scores are not calibrated across the proteome.
View Article and Find Full Text PDFIdentifying causal mutations accelerates genetic disease diagnosis, and therapeutic development. Missense variants present a bottleneck in genetic diagnoses as their effects are less straightforward than truncations or nonsense mutations. While computational prediction methods are increasingly successful at prediction for variants in disease genes, they do not generalize well to other genes as the scores are not calibrated across the proteome.
View Article and Find Full Text PDFSummary: Coevolutionary sequence analysis has become a commonly used technique for de novo prediction of the structure and function of proteins, RNA, and protein complexes. We present the EVcouplings framework, a fully integrated open-source application and Python package for coevolutionary analysis. The framework enables generation of sequence alignments, calculation and evaluation of evolutionary couplings (ECs), and de novo prediction of structure and mutation effects.
View Article and Find Full Text PDFBinding small ligands such as ions or macromolecules such as DNA, RNA, and other proteins is one important aspect of the molecular function of proteins. Many binding sites remain without experimental annotations. Predicting binding sites on a per-residue level is challenging, but if 3D structures are known, information about coevolving residue pairs (evolutionary couplings) can predict catalytic residues through mutual information.
View Article and Find Full Text PDFThe TP53 gene is frequently mutated in human cancer. Research has focused predominantly on six major "hotspot" codons, which account for only ∼30% of cancer-associated p53 mutations. To comprehensively characterize the consequences of the p53 mutation spectrum, we created a synthetically designed library and measured the functional impact of ∼10,000 DNA-binding domain (DBD) p53 variants in human cells in culture and in vivo.
View Article and Find Full Text PDFThe shape, elongation, division and sporulation (SEDS) proteins are a large family of ubiquitous and essential transmembrane enzymes with critical roles in bacterial cell wall biology. The exact function of SEDS proteins was for a long time poorly understood, but recent work has revealed that the prototypical SEDS family member RodA is a peptidoglycan polymerase-a role previously attributed exclusively to members of the penicillin-binding protein family. This discovery has made RodA and other SEDS proteins promising targets for the development of next-generation antibiotics.
View Article and Find Full Text PDFMany high-throughput experimental technologies have been developed to assess the effects of large numbers of mutations (variation) on phenotypes. However, designing functional assays for these methods is challenging, and systematic testing of all combinations is impossible, so robust methods to predict the effects of genetic variation are needed. Most prediction methods exploit evolutionary sequence conservation but do not consider the interdependencies of residues or bases.
View Article and Find Full Text PDFProtein flexibility ranges from simple hinge movements to functional disorder. Around half of all human proteins contain apparently disordered regions with little 3D or functional information, and many of these proteins are associated with disease. Building on the evolutionary couplings approach previously successful in predicting 3D states of ordered proteins and RNA, we developed a method to predict the potential for ordered states for all apparently disordered proteins with sufficiently rich evolutionary information.
View Article and Find Full Text PDFAccurate determination of protein structure by NMR spectroscopy is challenging for larger proteins, for which experimental data are often incomplete and ambiguous. Evolutionary sequence information together with advances in maximum entropy statistical methods provide a rich complementary source of structural constraints. We have developed a hybrid approach (evolutionary coupling-NMR spectroscopy; EC-NMR) combining sparse NMR data with evolutionary residue-residue couplings and demonstrate accurate structure determination for several proteins 6-41 kDa in size.
View Article and Find Full Text PDFInsect odorant receptors (ORs) comprise an enormous protein family that translates environmental chemical signals into neuronal electrical activity. These heptahelical receptors are proposed to function as ligand-gated ion channels and/or to act metabotropically as G protein-coupled receptors (GPCRs). Resolving their signalling mechanism has been hampered by the lack of tertiary structural information and primary sequence similarity to other proteins.
View Article and Find Full Text PDFProtein-protein interactions are fundamental to many biological processes. Experimental screens have identified tens of thousands of interactions, and structural biology has provided detailed functional insight for select 3D protein complexes. An alternative rich source of information about protein interactions is the evolutionary sequence record.
View Article and Find Full Text PDFBackground: 20 years of improved technology and growing sequences now renders residue-residue contact constraints in large protein families through correlated mutations accurate enough to drive de novo predictions of protein three-dimensional structure. The method EVfold broke new ground using mean-field Direct Coupling Analysis (EVfold-mfDCA); the method PSICOV applied a related concept by estimating a sparse inverse covariance matrix. Both methods (EVfold-mfDCA and PSICOV) are publicly available, but both require too much CPU time for interactive applications.
View Article and Find Full Text PDFBackground: Any method that de novo predicts protein function should do better than random. More challenging, it also ought to outperform simple homology-based inference.
Methods: Here, we describe a few methods that predict protein function exclusively through homology.
Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high.
View Article and Find Full Text PDFGenomic sequences contain rich evolutionary information about functional constraints on macromolecules such as proteins. This information can be efficiently mined to detect evolutionary couplings between residues in proteins and address the long-standing challenge to compute protein three-dimensional structures from amino acid sequences. Substantial progress has recently been made on this problem owing to the explosive growth in available sequences and the application of global statistical methods.
View Article and Find Full Text PDFWe show that amino acid covariation in proteins, extracted from the evolutionary sequence record, can be used to fold transmembrane proteins. We use this technique to predict previously unknown 3D structures for 11 transmembrane proteins (with up to 14 helices) from their sequences alone. The prediction method (EVfold_membrane) applies a maximum entropy approach to infer evolutionary covariation in pairs of sequence positions within a protein family and then generates all-atom models with the derived pairwise distance constraints.
View Article and Find Full Text PDFThe evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge.
View Article and Find Full Text PDF