Identifying and reducing error in cluster-expansion approximations of protein energies.

J Comput Chem

Department of Biology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA.

Published: December 2010

Protein design involves searching a vast space for sequences that are compatible with a defined structure. This can pose significant computational challenges. Cluster expansion is a technique that can accelerate the evaluation of protein energies by generating a simple functional relationship between sequence and energy. The method consists of several steps. First, for a given protein structure, a training set of sequences with known energies is generated. Next, this training set is used to expand energy as a function of clusters consisting of single residues, residue pairs, and higher order terms, if required. The accuracy of the sequence-based expansion is monitored and improved using cross-validation testing and iterative inclusion of additional clusters. As a trade-off for evaluation speed, the cluster-expansion approximation causes prediction errors, which can be reduced by including more training sequences, including higher order terms in the expansion, and/or reducing the sequence space described by the cluster expansion. This article analyzes the sources of error and introduces a method whereby accuracy can be improved by judiciously reducing the described sequence space. The method is applied to describe the sequence-stability relationship for several protein structures: coiled-coil dimers and trimers, a PDZ domain, and T4 lysozyme as examples with computationally derived energies, and SH3 domains in amphiphysin-1 and endophilin-1 as examples where the expanded pseudo-energies are obtained from experiments. Our open-source software package Cluster Expansion Version 1.0 allows users to expand their own energy function of interest and thereby apply cluster expansion to custom problems in protein design.

Download full-text PDF

Source
http://dx.doi.org/10.1002/jcc.21585DOI Listing

Publication Analysis

Top Keywords

cluster expansion
16
protein energies
8
protein design
8
training set
8
expand energy
8
energy function
8
higher order
8
order terms
8
sequence space
8
protein
6

Similar Publications

Oxygen vacancies (V's) are of paramount importance in influencing the properties and applications of ceria (CeO). Yet, comprehending the distribution and nature of V's poses a significant challenge due to the vast number of electronic configurations and intricate many-body interactions among V's and polarons (Ce ions). In this study, we established a cluster expansion model based on first-principles calculations and statistical learning to decouple the interactions among the Ce ions and V's, thereby circumventing the limitations associated with sampling electronic configurations.

View Article and Find Full Text PDF

Modeling of Electric Field and Dielectrophoretic Force in a Parallel-Plate Cell Separation Device with an Electrode Lid and Analytical Formulation Using Fourier Series.

Sensors (Basel)

December 2024

Department of Applied Physics, National Defense Academy, Hashirimizu 1-10-20, Yokosuka 239-0802, Kanagawa, Japan.

Dielectrophoresis (DEP) cell separation technology is an effective means of separating target cells which are only marginally present in a wide variety of cells. To develop highly efficient cell separation devices, detailed analysis of the nonuniform electric field's intensity distribution within the device is needed, as it affects separation performance. Here we analytically expressed the distributions of the electric field and DEP force in a parallel-plate cell separation DEP device by employing electrostatic analysis through the Fourier series method.

View Article and Find Full Text PDF

Unveiling the Genetic Diversity and Demographic History of in Sierra Leone Using Genotyping-By-Sequencing.

Plants (Basel)

December 2024

Sustainable Perennial Crops Laboratory, United States Department of Agriculture, Agriculture Research Service, Beltsville, MD 2005, USA.

is a rare Coffea species boasting a flavor profile comparable to Arabica coffee () and has a good adaptability to lowland tropical climates. This species faces increasing threats from climate change, deforestation, and habitat fragmentation in its West African homeland. Using 1037 novel SNP markers derived from Genotyping-by-Sequencing (GBS), we revealed the presence of three distinct natural populations (mean Fst = 0.

View Article and Find Full Text PDF

Expansion of the microbial drug discovery pipeline has been impeded by a limited and skewed appreciation of the microbial world and its full chemical capabilities and by an inability to induce silent biosynthetic gene clusters (BGCs). Typically, these silent genes are not expressed under standard laboratory conditions, instead requiring particular interventions to activate them. Genetic, physical, and chemical strategies have been employed to trigger these BGCs, and some have resulted in the induction of novel secondary metabolites.

View Article and Find Full Text PDF

The wind-blown sand protection system in the Shapotou section of the Baotou-Lanzhou Railway is a representative artificial ecosystem in a desert region. Over the past 70 years, this system has transformed mobile dunes into fixed dunes through vegetation succession, relying solely on natural rainfall without additional irrigation. However, ecosystem sustainability has been endangered by the emergence of numerous blowouts.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!