Publications by authors named "Jack Snoeyink"

This paper describes the current update on macromolecular model validation services that are provided at the MolProbity website, emphasizing changes and additions since the previous review in 2010. There have been many infrastructure improvements, including rewrite of previous Java utilities to now use existing or newly written Python utilities in the open-source CCTBX portion of the Phenix software system. This improves long-term maintainability and enhances the thorough integration of MolProbity-style validation within Phenix.

View Article and Find Full Text PDF

Interactions between polar atoms are challenging to model because at very short ranges they form hydrogen bonds (H-bonds) that are partially covalent in character and exhibit strong orientation preferences; at longer ranges the orientation preferences are lost, but significant electrostatic interactions between charged and partially charged atoms remain. To simultaneously model these two types of behavior, we refined an orientation dependent model of hydrogen bonds [Kortemme et al. J.

View Article and Find Full Text PDF

Accurate energy functions are critical to macromolecular modeling and design. We describe new tools for identifying inaccuracies in energy functions and guiding their improvement, and illustrate the application of these tools to the improvement of the Rosetta energy function. The feature analysis tool identifies discrepancies between structures deposited in the PDB and low-energy structures generated by Rosetta; these likely arise from inaccuracies in the energy function.

View Article and Find Full Text PDF

We describe a new approach for inferring the functional relationships between nonhomologous protein families by looking at statistical enrichment of alternative function predictions in classification hierarchies such as Gene Ontology (GO) and Structural Classification of Proteins (SCOP). Protein structures are represented by robust graph representations, and the fast frequent subgraph mining algorithm is applied to protein families to generate sets of family-specific packing motifs, i.e.

View Article and Find Full Text PDF

This paper describes several case studies concerning protein function inference from its structure using our novel approach described in the accompanying paper. This approach employs family-specific motifs, i.e.

View Article and Find Full Text PDF

Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges.

View Article and Find Full Text PDF

Pairwise structure alignment commonly uses root mean square deviation (RMSD) to measure the structural similarity, and methods for optimizing RMSD are well established. We extend RMSD to weighted RMSD for multiple structures. By using multiplicative weights, we show that weighted RMSD for all pairs is the same as weighted RMSD to an average of the structures.

View Article and Find Full Text PDF

MolProbity is a general-purpose web server offering quality validation for 3D structures of proteins, nucleic acids and complexes. It provides detailed all-atom contact analysis of any steric problems within the molecules as well as updated dihedral-angle diagnostics, and it can calculate and display the H-bond and van der Waals contacts in the interfaces between components. An integral step in the process is the addition and full optimization of all hydrogen atoms, both polar and nonpolar.

View Article and Find Full Text PDF

Although accurate details in RNA structure are of great importance for understanding RNA function, the backbone conformation is difficult to determine, and most existing RNA structures show serious steric clashes (>or= 0.4 A overlap) when hydrogen atoms are taken into account. We have developed a program called RNABC (RNA Backbone Correction) that performs local perturbations to search for alternative conformations that avoid those steric clashes or other local geometry problems.

View Article and Find Full Text PDF

Structure motifs are amino acid packing patterns that occur frequently within a set of protein structures. We define a labeled graph representation of protein structure in which vertices correspond to amino acid residues and edges connect pairs of residues and are labeled by (1) the Euclidian distance between the C(alpha) atoms of the two residues and (2) a boolean indicating whether the two residues are in physical/chemical contact. Using this representation, a structure motif corresponds to a labeled clique that occurs frequently among the graphs representing the protein structures.

View Article and Find Full Text PDF

Root mean square deviation (RMSD) is often used to measure the difference between structures. We show mathematically that, for multiple structure alignment, the minimum RMSD (weighted at aligned positions or unweighted) for all pairs is the same as the RMSD to the average of the structures. Thus, using RMSD implies that the average is a consensus structure.

View Article and Find Full Text PDF
Article Synopsis
  • Quantities from solvent accessible surface areas (SASA) are important in protein design, but their calculation is computationally expensive, making them hard to use in standard methods.
  • We present a new method to accurately maintain SASA during Monte Carlo searches for protein design by enhancing the existing Le Grand and Merz algorithm with a more efficient approach to updating SASA based on atom coverage.
  • Our optimized algorithm significantly reduces computation time—being about 145 times faster for large proteins—while effectively optimizing protein packing using SASA measures in redesign projects.
View Article and Find Full Text PDF

We describe a method to assign a protein structure to a functional family using family-specific fingerprints. Fingerprints represent amino acid packing patterns that occur in most members of a family but are rare in the background, a nonredundant subset of PDB; their information is additional to sequence alignments, sequence patterns, structural superposition, and active-site templates. Fingerprints were derived for 120 families in SCOP using Frequent Subgraph Mining.

View Article and Find Full Text PDF

We review schemes for dividing cubic cells into simplices (tetrahedra) for interpolating from sampled data to IR3, present visual and geometric artifacts generated in isosurfaces and volume renderings, and discuss how these artifacts relate to the filter kernels corresponding to the subdivision schemes.

View Article and Find Full Text PDF

We find recurring amino-acid residue packing patterns, or spatial motifs, that are characteristic of protein structural families, by applying a novel frequent subgraph mining algorithm to graph representations of protein three-dimensional structure. Graph nodes represent amino acids, and edges are chosen in one of three ways: first, using a threshold for contact distance between residues; second, using Delaunay tessellation; and third, using the recently developed almost-Delaunay edges. For a set of graphs representing a protein family from the Structural Classification of Proteins (SCOP) database, subgraph mining typically identifies several hundred common subgraphs corresponding to spatial motifs that are frequently found in proteins in the family but rarely found outside of it.

View Article and Find Full Text PDF

Larger rotamer libraries, which provide a fine grained discretization of side chain conformation space by sampling near the canonical rotamers, allow protein designers to find better conformations, but slow down the algorithms that search for them. We present a dynamic programming solution to the side chain placement problem which treats rotamers at high or low resolution only as necessary. Dynamic programming is an exact technique; we turn it into an approximation, but can still analyze the error that can be introduced.

View Article and Find Full Text PDF