We find recurring amino-acid residue packing patterns, or spatial motifs, that are characteristic of protein structural families, by applying a novel frequent subgraph mining algorithm to graph representations of protein three-dimensional structure. Graph nodes represent amino acids, and edges are chosen in one of three ways: first, using a threshold for contact distance between residues; second, using Delaunay tessellation; and third, using the recently developed almost-Delaunay edges. For a set of graphs representing a protein family from the Structural Classification of Proteins (SCOP) database, subgraph mining typically identifies several hundred common subgraphs corresponding to spatial motifs that are frequently found in proteins in the family but rarely found outside of it. We find that some of the large motifs map onto known functional regions in two protein families explored in this study, i.e., serine proteases and kinases. We find that graphs based on almost-Delaunay edges significantly reduce the number of edges in the graph representation and hence present computational advantage, yet the patterns extracted from such graphs have a biological interpretation approximately equivalent to that of those extracted from distance based graphs.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1089/cmb.2005.12.657 | DOI Listing |
Cell Rep Phys Sci
November 2024
Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.
Graph neural networks (GNNs) have emerged as powerful tools for representation learning. Their efficacy depends on their having an optimal underlying graph. In many cases, the most relevant information comes from specific subgraphs.
View Article and Find Full Text PDFMethodsX
June 2025
Department of Networking & Communications, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Chennai, India.
Forecasting student performance with precision in the educational space is paramount for creating tailor-made interventions capable to boost learning effectiveness. It means most of the traditional student performance prediction models have difficulty in dealing with multi-dimensional academic data, can cause sub-optimal classification and generate a simple generalized insight. To address these challenges of the existing system, in this research we propose a new model Multi-dimensional Student Performance Prediction Model (MSPP) that is inspired by advanced data preprocessing and feature engineering techniques using deep learning.
View Article and Find Full Text PDFBMC Bioinformatics
January 2025
School of Computer Science and Technology, University of Science and Technology of China, 443 Huangshan Road, Hefei, 230027, China.
Background: Drug-drug interactions (DDIs) especially antagonistic ones present significant risks to patient safety, underscoring the urgent need for reliable prediction methods. Recently, substructure-based DDI prediction has garnered much attention due to the dominant influence of functional groups and substructures on drug properties. However, existing approaches face challenges regarding the insufficient interpretability of identified substructures and the isolation of chemical substructures.
View Article and Find Full Text PDFSci Rep
January 2025
Advanced Manufacturing Lab, ETH Zürich, Leonhardstrasse 21, 8092, Zurich, Switzerland.
The rapid advancements in additive manufacturing (AM) across different scales and material classes have enabled the creation of architected materials with highly tailored properties. Beyond geometric flexibility, multi-material AM further expands design possibilities by combining materials with distinct characteristics. While machine learning has recently shown great potential for the fast inverse design of lattice structures, its application has largely been limited to single-material systems.
View Article and Find Full Text PDFJ Biomed Inform
January 2025
Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, 02115, MA, USA; VA Boston Healthcare System, 150 S Huntington Ave, Boston, 02130, MA, USA. Electronic address:
Objective: Electronic health record (EHR) systems contain a wealth of clinical data stored as both codified data and free-text narrative notes (NLP). The complexity of EHR presents challenges in feature representation, information extraction, and uncertainty quantification. To address these challenges, we proposed an efficient Aggregated naRrative Codified Health (ARCH) records analysis to generate a large-scale knowledge graph (KG) for a comprehensive set of EHR codified and narrative features.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!