Protein Binding Site Representation in Latent Space.

Mol Inform

Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 4, 8093, Zürich, Switzerland.

Published: January 2025

Interpretability and reliability of deep learning models are important for computer-based drug discovery. Aiming to understand feature perception by such a model, we investigate a graph neural network for affinity prediction of protein-ligand complexes. We assess a latent representation of ligand binding sites and investigate underlying geometric structure in this latent space and its relation to protein function. We introduce an automated computational pipeline for dimensionality reduction, clustering, hypothesis testing, and visualization of latent space. The results indicate that the learned protein latent space is inherently structured and not randomly distributed. Several of the identified protein binding site clusters in latent space correspond to functional protein families. Ligand size was found to be a determinant of cluster geometry. The computational pipeline proved applicable to latent space analysis and interpretation and can be adapted to work for different datasets and deep learning models.

Download full-text PDF

Source
http://dx.doi.org/10.1002/minf.202400205DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11733832PMC

Publication Analysis

Top Keywords

latent space
24
protein binding
8
binding site
8
deep learning
8
learning models
8
computational pipeline
8
latent
7
space
6
protein
5
site representation
4

Similar Publications

Addressing the issues of inadequate information exchange among subsequences in the operational time series of water injection pumps, leading to low accuracy and high false alarm rates in anomaly detection, this paper proposes a multidimensional time series anomaly detection method for water injection pump operations, leveraging Long Short-Term Memory Autoencoder augmented with Attention Mechanism (LSTMA-AE) and mechanistic constraints. The LSTMA-AE framework encompasses three primary modules: a Time Feature Extraction Module (Encoder), an Attention Layer, and a Data Reconstruction Module (Decoder). The Encoder captures temporal dependencies and features within the input sequences, mapping the input data into a higher-dimensional space.

View Article and Find Full Text PDF

Variational graph autoencoder for reconstructed transcriptomic data associated with NLRP3 mediated pyroptosis in periodontitis.

Sci Rep

January 2025

Department of Basic Sciences, Faculty of Dentistry, Universidad de Antioquia U de A, Medellín, 050010, Colombia.

The NLRP3 inflammasome, regulated by TLR4, plays a pivotal role in periodontitis by mediating inflammatory cytokine release and bone loss induced by Porphyromonas gingivalis. Periodontal disease creates a hypoxic environment, favoring anaerobic bacteria survival and exacerbating inflammation. The NLRP3 inflammasome triggers pyroptosis, a programmed cell death that amplifies inflammation and tissue damage.

View Article and Find Full Text PDF

Directed evolution of antimicrobial peptides using multi-objective zeroth-order optimization.

Brief Bioinform

November 2024

School of Computer Science and Technology, Harbin Institute of Technology, HIT Campus, Shenzhen University Town, Nanshan District, Shenzhen 518055, Guangdong, China.

Antimicrobial peptides (AMPs) emerge as a type of promising therapeutic compounds that exhibit broad spectrum antimicrobial activity with high specificity and good tolerability. Natural AMPs usually need further rational design for improving antimicrobial activity and decreasing toxicity to human cells. Although several algorithms have been developed to optimize AMPs with desired properties, they explored the variations of AMPs in a discrete amino acid sequence space, usually suffering from low efficiency, lack diversity, and local optimum.

View Article and Find Full Text PDF

Single-cell multi-omics techniques, which enable the simultaneous measurement of multiple modalities such as RNA gene expression and Assay for Transposase-Accessible Chromatin (ATAC) within individual cells, have become a powerful tool for deciphering the intricate complexity of cellular systems. Most current methods rely on motif databases to establish cross-modality relationships between genes from RNA-seq data and peaks from ATAC-seq data. However, these approaches are constrained by incomplete database coverage, particularly for novel or poorly characterized relationships.

View Article and Find Full Text PDF

The 12-lead electrocardiogram (ECG) is inexpensive and widely available. Whether conditions across the human disease landscape can be detected using the ECG is unclear. We developed a deep learning denoising autoencoder and systematically evaluated associations between ECG encodings and ~1,600 Phecode-based diseases in three datasets separate from model development, and meta-analyzed the results.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!