When existing experimental data are combined with machine learning (ML) to predict the performance of new materials, the data acquisition bias determines ML usefulness and the prediction accuracy. In this context, the following two conditions are highly common: (i) constructing new unbiased data sets is too expensive and the global knowledge effectively does not change by performing a limited number of novel measurements; (ii) the performance of the material depends on a limited number of physical parameters, much smaller than the range of variables that can be changed, albeit such parameters are unknown or not measurable. To determine the usefulness of ML under these conditions, we introduce the concept of simulated research landscapes, which describe how datasets of arbitrary complexity evolve over time. Simulated research landscapes allow us to use different discovery strategies to compare standard materials exploration with ML-guided explorations, i.e. we can measure quantitatively the benefit of using a specific ML model. We show that there is a window of opportunity to obtain a significant benefit from ML-guided strategies. The adoption of ML can take place too soon (not enough information to find patterns) or too late (dense datasets only allow for negligible ML benefit), and the adoption of ML can even slow down the discovery process in some cases. We offer a qualitative guide on when ML can accelerate the discovery of new best-performing materials in a field under specific conditions. The answer in each case depends on factors like data dimensionality, corrugation and data collection strategy. We consider how these factors may affect the ML prediction capabilities and discuss some general trends.

Download full-text PDF

Source
http://dx.doi.org/10.1039/d1cp01761fDOI Listing

Publication Analysis

Top Keywords

simulated landscapes
12
machine learning
8
limited number
8
data
5
determining machine
4
materials
4
learning materials
4
discovery
4
materials discovery
4
discovery simulated
4

Similar Publications

Prostate cancer is a widespread health issue that affects men worldwide. It is one of the most common forms of cancer, and its development is influenced by a combination of hereditary, epigenetic, environmental, age, and lifestyle factors. Given that it is the second most common cause of cancer-related deaths in men, it is crucial to comprehend its complex facets.

View Article and Find Full Text PDF

We have developed the regionalpcs method, an approach for summarizing gene-level methylation. regionalpcs addresses the challenge of deciphering complex epigenetic mechanisms in diseases like Alzheimer's disease. In contrast to averaging, regionalpcs uses principal components analysis to capture complex methylation patterns across gene regions.

View Article and Find Full Text PDF

Molecular basis of proton sensing by G protein-coupled receptors.

Cell

December 2024

Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA 94143, USA; Chan Zuckerberg Biohub, San Francisco, CA 94148, USA; Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Anesthesia and Perioperative Care, University of California, San Francisco, San Francisco, CA 94115, USA. Electronic address:

Three proton-sensing G protein-coupled receptors (GPCRs)-GPR4, GPR65, and GPR68-respond to extracellular pH to regulate diverse physiology. How protons activate these receptors is poorly understood. We determined cryogenic-electron microscopy (cryo-EM) structures of each receptor to understand the spatial arrangement of proton-sensing residues.

View Article and Find Full Text PDF

This study reveals the anti-tyrosinase activity of Ganoderma formosanum extracts, pinpointing compounds including gluconic acid, mesalamine, L-pyroglutamic acid, esculetin, 5-hydroxyindole, and salicylic acid, as effective melanin production inhibitors in melanoma cells and zebrafish embryos. Furthermore, multiple molecular docking simulations provided insights into interactions between the identified compounds and tyrosinase, increasing binding affinity up to -16.36 kcal/mol.

View Article and Find Full Text PDF

Against the backdrop of an aging population, community pension initiatives are gaining traction, permeating societal landscapes. This study delves into the equilibrium strategy within the context of a defined benefit pension plan, employing a differential game framework with a community pension model. Hence, the model entails the company's controls over investment rates in funds, juxtaposed with employees' inclination towards a greater proportion of community pension allocation in said funds.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!