The lack of publicly available, large, and unbiased datasets is a key bottleneck for the application of machine learning (ML) methods in synthetic chemistry. Data from electronic laboratory notebooks (ELNs) could provide less biased, large datasets, but no such datasets have been made publicly available. The first real-world dataset from the ELNs of a large pharmaceutical company is disclosed and its relationship to high-throughput experimentation (HTE) datasets is described.
View Article and Find Full Text PDFThe application of machine learning (ML) techniques to model high-throughput experimentation (HTE) datasets has seen a recent rise in popularity. Nevertheless, the ability to model the interplay between reaction components, known as interaction effects, with ML remains an outstanding challenge. Using a simulated HTE dataset, we find that the presence of irrelevant features poses a strong obstacle to learning interaction effects with common ML algorithms.
View Article and Find Full Text PDFNi/photoredox catalysis has emerged as a powerful platform for C(sp)-C(sp) bond formation. While many of these methods typically employ aryl bromides as the C(sp) coupling partner, a variety of aliphatic radical sources have been investigated. In principle, these reactions enable access to the same product scaffolds, but it can be hard to discern which method to employ because nonstandardized sets of aryl bromides are used in scope evaluation.
View Article and Find Full Text PDFNumerous disciplines, such as image recognition and language translation, have been revolutionized by using machine learning (ML) to leverage big data. In organic synthesis, providing accurate chemical reactivity predictions with supervised ML could assist chemists with reaction prediction, optimization, and mechanistic interrogation.To apply supervised ML to chemical reactions, one needs to define the object of prediction (e.
View Article and Find Full Text PDF