FEP Augmentation as a Means to Solve Data Paucity Problems for Machine Learning in Chemical Biology.

J Chem Inf Model

Avicenna Biosciences Inc., 101 W. Chapel Hill Street, Suite 210, Durham, North Carolina 27001, United States.

Published: May 2024

In the realm of medicinal chemistry, the primary objective is to swiftly optimize a multitude of chemical properties of a set of compounds to yield a clinical candidate poised for clinical trials. In recent years, two computational techniques, machine learning (ML) and physics-based methods, have evolved substantially and are now frequently incorporated into the medicinal chemist's toolbox to enhance the efficiency of both hit optimization and candidate design. Both computational methods come with their own set of limitations, and they are often used independently of each other. ML's capability to screen extensive compound libraries expediently is tempered by its reliance on quality data, which can be scarce especially during early-stage optimization. Contrarily, physics-based approaches like free energy perturbation (FEP) are frequently constrained by low throughput and high cost by comparison; however, physics-based methods are capable of making highly accurate binding affinity predictions. In this study, we harnessed the strength of FEP to overcome data paucity in ML by generating virtual activity data sets which then inform the training of algorithms. Here, we show that ML algorithms trained with an FEP-augmented data set could achieve comparable predictive accuracy to data sets trained on experimental data from biological assays. Throughout the paper, we emphasize key mechanistic considerations that must be taken into account when aiming to augment data sets and lay the groundwork for successful implementation. Ultimately, the study advocates for the synergy of physics-based methods and ML to expedite the lead optimization process. We believe that the physics-based augmentation of ML will significantly benefit drug discovery, as these techniques continue to evolve.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11094716PMC
http://dx.doi.org/10.1021/acs.jcim.4c00071DOI Listing

Publication Analysis

Top Keywords

physics-based methods
12
data sets
12
data
8
data paucity
8
machine learning
8
physics-based
5
fep augmentation
4
augmentation solve
4
solve data
4
paucity problems
4

Similar Publications

The 3D structure of RNA critically influences its functionality, and understanding this structure is vital for deciphering RNA biology. Experimental methods for determining RNA structures are labour-intensive, expensive, and time-consuming. Computational approaches have emerged as valuable tools, leveraging physics-based-principles and machine learning to predict RNA structures rapidly.

View Article and Find Full Text PDF

Estimating seismic anisotropy parameters, such as Thomson's parameters, is crucial for investigating fractured and finely layered geological media. However, many inversion methods rely on complex physical models with initial assumptions, leading to non-reproducible estimates and subjective fracture interpretation. To address these limitations, this study utilizes machine learning methods: support vector regression, extreme gradient boost, multi-layer perceptron, and a convolutional neural network.

View Article and Find Full Text PDF

Microscale Electrical Resistivity Measurements to Investigate Particle Distribution.

Langmuir

January 2025

Materials Science and Engineering, Drexel University, 3141 Chestnut Street, Philadelphia, Pennsylvania 19104, United States.

The functional performance of a particulate thin film depends greatly on the particle distribution that forms during drying. In situ methods for monitoring the impact of different processing parameters on the distribution of particles currently require expensive and specialized equipment. This work addresses this gap by miniaturizing a geophysical prospecting method to thin-film applications.

View Article and Find Full Text PDF

The discrete empirical interpolation method (DEIM) is well-established as a means of performing model order reduction in approximating solutions to differential equations, but it has also more recently demonstrated potential in performing data class detection through subset selection. Leveraging the singular value decomposition for dimension reduction, DEIM uses interpolatory projection to identify the representative rows and/or columns of a data matrix. This approach has been adapted to develop additional algorithms, including a CUR matrix factorization for performing dimension reduction while preserving the interpretability of the data.

View Article and Find Full Text PDF

Integrative Modeling in the Age of Machine Learning: A Summary of HADDOCK Strategies in CAPRI Rounds 47-55.

Proteins

December 2024

Bijvoet Centre for Biomolecular Research, Faculty of Science-Chemistry, Utrecht University, Utrecht, The Netherlands.

The HADDOCK team participated in CAPRI rounds 47-55 as server, manual predictor, and scorers. Throughout these CAPRI rounds, we used a plethora of computational strategies to predict the structure of protein complexes. Of the 10 targets comprising 24 interfaces, we achieved acceptable or better models for 3 targets in the human category and 1 in the server category.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!