Large property models: a new generative machine-learning formulation for molecules.

Faraday Discuss

Department of Chemical and Biomolecular Engineering, The University of Notre Dame, Notre Dame, Indiana, USA.

Published: September 2024

AI Article Synopsis

  • Generative models for designing molecules have not shown significant improvements over traditional expert intuition, particularly in predicting specific properties due to limited data availability.
  • A major challenge is that there are often very few samples for desired properties, making it hard to accurately map properties to molecular structures.
  • The authors propose that providing multiple properties during training can enhance the accuracy of generative models, leading to new models they call "large property models" (LPMs) which incorporate a wealth of available chemical property data to improve predictions.

Article Abstract

Generative models for the inverse design of molecules with particular properties have been heavily hyped, but have yet to demonstrate significant gains over machine-learning-augmented expert intuition. A major challenge of such models is their limited accuracy in predicting molecules with targeted properties in the data-scarce regime, which is the regime typical of the prized outliers that it is hoped inverse models will discover. For example, activity data for a drug target or stability data for a material may only number in the tens to hundreds of samples, which is insufficient to learn an accurate and reasonably general property-to-structure inverse mapping from scratch. We've hypothesized that the property-to-structure mapping becomes unique when a sufficient number of properties are supplied to the models during training. This hypothesis has several important corollaries if true. It would imply that data-scarce properties can be completely determined using a set of more accessible molecular properties. It would also imply that a generative model trained on multiple properties would exhibit an accuracy phase transition after achieving a sufficient size-a process analogous to what has been observed in the context of large language models. To interrogate these behaviors, we have built the first transformers trained on the property-to-molecular-graph task, which we dub "large property models" (LPMs). A key ingredient is supplementing these models during training with relatively basic but abundant chemical property data. The motivation for the large-property-model paradigm, the model architectures, and case studies are presented here.

Download full-text PDF

Source
http://dx.doi.org/10.1039/d4fd00113cDOI Listing

Publication Analysis

Top Keywords

models training
8
models
7
properties
6
large property
4
property models
4
models generative
4
generative machine-learning
4
machine-learning formulation
4
formulation molecules
4
molecules generative
4

Similar Publications

The Genetic Odyssey of Axolotl Regeneration: Insights and Innovations.

Int J Dev Biol

December 2024

Key Laboratory of Evolution & Marine Biodiversity (Ministry of Education) and Institute of Evolution and Marine Biodiversity, Ocean University of China, Qingdao, China.

The axolotl, a legendary creature with the potential to regenerate complex body parts, is positioned as a powerful model organism due to its extraordinary regenerative capabilities. Axolotl can undergo successful regeneration of multiple structures, providing us with the opportunity to understand the factors that exhibit altered activity between regenerative and non-regenerative animals. This comprehensive review will explore the mysteries of axolotl regeneration, from the initial cellular triggers to the intricate signaling cascades that guide this complex process.

View Article and Find Full Text PDF

Aim: Colorectal cancer (CRC) ranks as the second most diagnosed and third most deadly cancer worldwide. Despite advances in early diagnosis and treatment, CRC remains a leading cause of cancer-related deaths. Up to 30% of CRC patients are diagnosed during emergency department visits, leading to surgical procedures that may not adhere to oncological principles due to complications like obstruction, bleeding, or perforation.

View Article and Find Full Text PDF

Aim: The prognostic factors and a nomogram applicable to breast cancer (BC) patients with bone metastasis (BM) who received first-line chemotherapy have not been extensively studied. This study aimed to identify prognostic factors and construct a prognostic nomogram to predict overall survival (OS) in this population.

Methods: Data for BC patients with BM undergoing first-line chemotherapy were retrieved from the Surveillance, Epidemiology, and End Results (SEER) database from 2010 to 2016.

View Article and Find Full Text PDF

White matter hyperintensities (WMH) of presumed vascular origin are a magnetic resonance imaging (MRI)-based biomarker of cerebral small vessel disease (CSVD). WMH are associated with cognitive decline and increased risk of stroke and dementia, and are commonly observed in aging, vascular cognitive impairment, and neurodegenerative diseases. The reliable and rapid measurement of WMH in large-scale multisite clinical studies with heterogeneous patient populations remains challenging, where the diversity of imaging characteristics across studies adds additional complexity to this task.

View Article and Find Full Text PDF

Accurate prediction of physicochemical properties, such as electronic energy, enthalpy, free energy, and average vibrational frequencies, is critical for optimizing lithium-ion battery (LIB) performance. Traditional methods like density functional theory (DFT) are computationally expensive and inefficient for large-scale screening. In this study, we apply active learning on top of graph neural networks (GNNs) to efficiently predict these properties.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!