We find recurring amino-acid residue packing patterns, or spatial motifs, that are characteristic of protein structural families, by applying a novel frequent subgraph mining algorithm to graph representations of protein three-dimensional structure. Graph nodes represent amino acids, and edges are chosen in one of three ways: first, using a threshold for contact distance between residues; second, using Delaunay tessellation; and third, using the recently developed almost-Delaunay edges. For a set of graphs representing a protein family from the Structural Classification of Proteins (SCOP) database, subgraph mining typically identifies several hundred common subgraphs corresponding to spatial motifs that are frequently found in proteins in the family but rarely found outside of it. We find that some of the large motifs map onto known functional regions in two protein families explored in this study, i.e., serine proteases and kinases. We find that graphs based on almost-Delaunay edges significantly reduce the number of edges in the graph representation and hence present computational advantage, yet the patterns extracted from such graphs have a biological interpretation approximately equivalent to that of those extracted from distance based graphs.

Download full-text PDF

Source
http://dx.doi.org/10.1089/cmb.2005.12.657DOI Listing

Publication Analysis

Top Keywords

graph representations
8
representations protein
8
spatial motifs
8
subgraph mining
8
almost-delaunay edges
8
protein
5
comparing graph
4
protein structure
4
structure mining
4
mining family-specific
4

Similar Publications

A variational graph-partitioning approach to modeling protein liquid-liquid phase separation.

Cell Rep Phys Sci

November 2024

Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.

Graph neural networks (GNNs) have emerged as powerful tools for representation learning. Their efficacy depends on their having an optimal underlying graph. In many cases, the most relevant information comes from specific subgraphs.

View Article and Find Full Text PDF

Forecasting student performance with precision in the educational space is paramount for creating tailor-made interventions capable to boost learning effectiveness. It means most of the traditional student performance prediction models have difficulty in dealing with multi-dimensional academic data, can cause sub-optimal classification and generate a simple generalized insight. To address these challenges of the existing system, in this research we propose a new model Multi-dimensional Student Performance Prediction Model (MSPP) that is inspired by advanced data preprocessing and feature engineering techniques using deep learning.

View Article and Find Full Text PDF

Background: Drug-drug interactions (DDIs) especially antagonistic ones present significant risks to patient safety, underscoring the urgent need for reliable prediction methods. Recently, substructure-based DDI prediction has garnered much attention due to the dominant influence of functional groups and substructures on drug properties. However, existing approaches face challenges regarding the insufficient interpretability of identified substructures and the isolation of chemical substructures.

View Article and Find Full Text PDF

Multi-objective design of multi-material truss lattices utilizing graph neural networks.

Sci Rep

January 2025

Advanced Manufacturing Lab, ETH Zürich, Leonhardstrasse 21, 8092, Zurich, Switzerland.

The rapid advancements in additive manufacturing (AM) across different scales and material classes have enabled the creation of architected materials with highly tailored properties. Beyond geometric flexibility, multi-material AM further expands design possibilities by combining materials with distinct characteristics. While machine learning has recently shown great potential for the fast inverse design of lattice structures, its application has largely been limited to single-material systems.

View Article and Find Full Text PDF

ARCH: Large-scale knowledge graph via aggregated narrative codified health records analysis.

J Biomed Inform

January 2025

Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, 02115, MA, USA; VA Boston Healthcare System, 150 S Huntington Ave, Boston, 02130, MA, USA. Electronic address:

Objective: Electronic health record (EHR) systems contain a wealth of clinical data stored as both codified data and free-text narrative notes (NLP). The complexity of EHR presents challenges in feature representation, information extraction, and uncertainty quantification. To address these challenges, we proposed an efficient Aggregated naRrative Codified Health (ARCH) records analysis to generate a large-scale knowledge graph (KG) for a comprehensive set of EHR codified and narrative features.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!