Learning Molecular Representation in a Cell.

Gang Liu Srijit Seal John Arevalo Zhenwen Liang Anne E Carpenter Meng Jiang Shantanu Singh

ArXiv

Published: October 2024

Predicting drug efficacy and safety requires information on biological responses (e.g., cell morphology and gene expression) to small molecule perturbations. However, current molecular representation learning methods do not provide a comprehensive view of cell states under these perturbations and struggle to remove noise, hindering model generalization. We introduce the mation ment (InfoAlign) approach to learn molecular representations through the information bottleneck method in cells. We integrate molecules and cellular response data as nodes into a context graph, connecting them with weighted edges based on chemical, biological, and computational criteria. For each molecule in a training batch, InfoAlign optimizes the encoder's latent representation with a minimality objective to discard redundant structural information. A sufficiency objective decodes the representation to align with different feature spaces from the molecule's neighborhood in the context graph. We demonstrate that the proposed sufficiency objective for alignment is tighter than existing encoder-based contrastive methods. Empirically, we validate representations from InfoAlign in two downstream applications: molecular property prediction against up to 27 baseline methods across four datasets, plus zero-shot molecule-morphology matching.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11213146	PMC

Publication Analysis

Top Keywords

molecular representation

context graph

sufficiency objective

learning molecular

representation

representation cell

cell predicting

predicting drug

drug efficacy

efficacy safety

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!