Probing machine learning models based on high throughput experimentation data for the discovery of asymmetric hydrogenation catalysts.

Adarsh V Kalikadien Cecile Valsecchi Robbert van Putten Tor Maes Mikko Muuronen Natalia Dyubankova Laurent Lefort Evgeny A Pidko

Chem Sci

Inorganic Systems Engineering, Department of Chemical Engineering, Faculty of Applied Sciences, Delft University of Technology Van der Maasweg 9, 2629 HZ Delft The Netherlands

Published: August 2024

Enantioselective hydrogenation of olefins by Rh-based chiral catalysts has been extensively studied for more than 50 years. Naively, one would expect that everything about this transformation is known and that selecting a catalyst that induces the desired reactivity or selectivity is a trivial task. Nonetheless, ligand engineering or selection for any new prochiral olefin remains an empirical trial-error exercise. In this study, we investigated whether machine learning techniques could be used to accelerate the identification of the most efficient chiral ligand. For this purpose, we used high throughput experimentation to build a large dataset consisting of results for Rh-catalyzed asymmetric olefin hydrogenation, specially designed for applications in machine learning. We showcased its alignment with existing literature while addressing observed discrepancies. Additionally, a computational framework for the automated and reproducible quantum-chemistry based featurization of catalyst structures was created. Together with less computationally demanding representations, these descriptors were fed into our machine learning pipeline for both out-of-domain and in-domain prediction tasks of selectivity and reactivity. For out-of-domain purposes, our models provided limited efficacy. It was found that even the most expensive descriptors do not impart significant meaning to the model predictions. The in-domain application, while partly successful for predictions of conversion, emphasizes the need for evaluating the cost-benefit ratio of computationally intensive descriptors and for tailored descriptor design. Challenges persist in predicting enantioselectivity, calling for caution in interpreting results from small datasets. Our insights underscore the importance of dataset diversity with broad substrate inclusion and suggest that mechanistic considerations could improve the accuracy of statistical models.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11352728	PMC
http://dx.doi.org/10.1039/d4sc03647f	DOI Listing

Publication Analysis

Top Keywords

machine learning

high throughput

throughput experimentation

probing machine

learning

learning models

models based

based high

experimentation data

data discovery

Similar Publications

Adaptive deep feature representation learning for cross-subject EEG decoding.

BMC Bioinformatics

December 2024

College of Computer and Information Engineering/College of Artificial Intelligence, Nanjing Tech University, Nanjing, 210093, China.

Shuang Liang Linzhe Li Wei Zu Wei Feng Wenlong Hang

Background: The collection of substantial amounts of electroencephalogram (EEG) data is typically time-consuming and labor-intensive, which adversely impacts the development of decoding models with strong generalizability, particularly when the available data is limited. Utilizing sufficient EEG data from other subjects to aid in modeling the target subject presents a potential solution, commonly referred to as domain adaptation. Most current domain adaptation techniques for EEG decoding primarily focus on learning shared feature representations through domain alignment strategies.

View Article and Find Full Text PDF

Similar Publications

Automated differentiation of wide QRS complex tachycardia using QRS complex polarity.

Commun Med (Lond)

December 2024

Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA.

Adam M May Bhavesh B Katbamna Preet A Shaikh Sarah LoCoco Elena Deych

Background: Wide QRS complex tachycardia (WCT) differentiation into ventricular tachycardia (VT) and supraventricular wide complex tachycardia (SWCT) remains challenging despite numerous 12-lead electrocardiogram (ECG) criteria and algorithms. Automated solutions leveraging computerized ECG interpretation (CEI) measurements and engineered features offer practical ways to improve diagnostic accuracy. We propose automated algorithms based on (i) WCT QRS polarity direction (WCT Polarity Code [WCT-PC]) and (ii) QRS polarity shifts between WCT and baseline ECGs (QRS Polarity Shift [QRS-PS]).

View Article and Find Full Text PDF

Similar Publications

AutoML based workflow for design of experiments (DOE) selection and benchmarking data acquisition strategies with simulation models.

Sci Rep

December 2024

Aschaffenburg University of Applied Sciences, Faculty of Engineering, Aschaffenburg, 63743, Germany.

Xukuan Xu Donghui Li Jinghou Bi Michael Moeckel

Design of experiments (DOE) is an established method to allocate resources for efficient parameter space exploration. Model based active learning (AL) data sampling strategies have shown potential for further optimization. This paper introduces a workflow for conducting DOE comparative studies using automated machine learning.

View Article and Find Full Text PDF

Similar Publications

Understanding the coupled relationship between regional longevity and physical geographical environment in Hechi, Guangxi, China.

Sci Rep

December 2024

Department of Infrastructure, The University of Melbourne, Melbourne, Australia.

Qucheng Deng Yaqing Liu Yongping Wei Wei Liang Kaixian Zhu

Healthy ageing plays an important role in ageing societies in many countries, and centenarians are a sign of longevity. Longevity and its determinants have become issues of global concern and also a focus of research. Although many disciplines have conducted out a series of studies on longevity phenomena, few studies have systematically considered the impact of geographical environmental factors.

View Article and Find Full Text PDF

Similar Publications

Self-supervised denoising of grating-based phase-contrast computed tomography.

Sci Rep

December 2024

Research Group Biomedical Imaging Physics, Department of Physics, TUM School of Natural Sciences, Technical University of Munich, 85748, Garching, Germany.

Sami Wirtensohn Clemens Schmid Daniel Berthe Dominik John Lisa Heck

In the last decade, grating-based phase-contrast computed tomography (gbPC-CT) has received growing interest. It provides additional information about the refractive index decrement in the sample. This signal shows an increased soft-tissue contrast.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!