Publications by authors named "Ben Blaiszik"

The current generation of large language models (LLMs) has limited chemical knowledge. Recently, it has been shown that these LLMs can learn and predict chemical properties through fine-tuning. Using natural language to train machine learning models opens doors to a wider chemical audience, as field-specific featurization techniques can be omitted.

View Article and Find Full Text PDF
Article Synopsis
  • The study focuses on how electrochemical bubbles affect the performance of gas-evolving electrodes, noting that previous research has not thoroughly examined bubble-caused inactivation during their evolution.
  • By employing surface engineering techniques, researchers can control bubble formation and demonstrate that the commonly held belief about inactivation impacting the entire projected area of the electrode is inaccurate.
  • Utilizing machine learning for bubble detection, the study reveals that surface-engineered electrodes show smaller bubble impacts, leading to a more accurate method for estimating inactivation based on direct bubble contact areas.
View Article and Find Full Text PDF

Rotational spectroscopy is the most accurate method for determining structures of molecules in the gas phase. It is often assumed that a rotational spectrum is a unique "fingerprint" of a molecule. The availability of large molecular databases and the development of artificial intelligence methods for spectroscopy make the testing of this assumption timely.

View Article and Find Full Text PDF
Article Synopsis
  • Large-language models like GPT-4 have sparked interest among scientists, especially in fields like chemistry and materials science.
  • A hackathon was organized to explore their potential applications, resulting in various projects such as predicting molecular properties and developing educational tools.
  • The rapid prototyping of ideas within the hackathon suggests that LLMs could significantly influence multiple scientific disciplines beyond just chemistry and materials science.
View Article and Find Full Text PDF

The information content of atomic-resolution scanning transmission electron microscopy (STEM) images can often be reduced to a handful of parameters describing each atomic column, chief among which is the column position. Neural networks (NNs) are high performance, computationally efficient methods to automatically locate atomic columns in images, which has led to a profusion of NN models and associated training datasets. We have developed a benchmark dataset of simulated and experimental STEM images and used it to evaluate the performance of two sets of recent NN models for atom location in STEM images.

View Article and Find Full Text PDF

A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding principles have been re-interpreted or extended to include the software, tools, algorithms, and workflows that produce data. FAIR principles are now being adapted in the context of AI models and datasets.

View Article and Find Full Text PDF

The availability of materials data for impact-mitigating materials has lagged behind applications-based data. For example, data describing on-field helmeted impacts are available, whereas material behaviors for the constituent impact-mitigating materials used in helmet designs lack open datasets. Here, we describe a new FAIR (findable, accessible, interoperable, reusable) data framework with structural and mechanical response data for one example elastic impact protection foam.

View Article and Find Full Text PDF

Protein-ligand docking is a computational method for identifying drug leads. The method is capable of narrowing a vast library of compounds down to a tractable size for downstream simulation or experimental testing and is widely used in drug discovery. While there has been progress in accelerating scoring of compounds with artificial intelligence, few works have bridged these successes back to the virtual screening community in terms of utility and forward-looking development.

View Article and Find Full Text PDF

A concise and measurable set of FAIR (Findable, Accessible, Interoperable and Reusable) principles for scientific data is transforming the state-of-practice for data management and stewardship, supporting and enabling discovery and innovation. Learning from this initiative, and acknowledging the impact of artificial intelligence (AI) in the practice of science and engineering, we introduce a set of practical, concise, and measurable FAIR principles for AI models. We showcase how to create and share FAIR data and AI models within a unified computational framework combining the following elements: the Advanced Photon Source at Argonne National Laboratory, the Materials Data Facility, the Data and Learning Hub for Science, and funcX, and the Argonne Leadership Computing Facility (ALCF), in particular the ThetaGPU supercomputer and the SambaNova DataScale system at the ALCF AI Testbed.

View Article and Find Full Text PDF

Powerful detectors at modern experimental facilities routinely collect data at multiple GB/s. Online analysis methods are needed to enable the collection of only interesting subsets of such massive data streams, such as by explicitly discarding some data elements or by directing instruments to relevant areas of experimental space. Thus, methods are required for configuring and running distributed computing pipelines-what we call flows-that link instruments, computers (e.

View Article and Find Full Text PDF

Bioinspired photocatalysis has resulted in efficient solutions for many areas of science and technology spanning from solar cells to medicine. Here we show a new bioinspired semiconductor nanocomposite (nanoTiO-DOPA-luciferase, TiDoL) capable of converting light energy within cancerous tissues into chemical species that are highly disruptive to cell metabolism and lead to cell death. This localized activity of semiconductor nanocomposites is triggered by cancer-generated activators.

View Article and Find Full Text PDF

Despite the recent availability of vaccines against the acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the search for inhibitory therapeutic agents has assumed importance especially in the context of emerging new viral variants. In this paper, we describe the discovery of a novel noncovalent small-molecule inhibitor, MCULE-5948770040, that binds to and inhibits the SARS-Cov-2 main protease (M) by employing a scalable high-throughput virtual screening (HTVS) framework and a targeted compound library of over 6.5 million molecules that could be readily ordered and purchased.

View Article and Find Full Text PDF

Recent machine learning models for bandgap prediction that explicitly encode the structure information to the model feature set significantly improve the model accuracy compared to both traditional machine learning and non-graph-based deep learning methods. The ongoing rapid growth of open-access bandgap databases can benefit such model construction not only by expanding their domain of applicability but also by requiring constant updating of the model. Here, we build a new state-of-the-art multi-fidelity graph network model for bandgap prediction of crystalline compounds from a large bandgap database of experimental and density functional theory (DFT) computed bandgaps with over 806 600 entries (1500 experimental, 775 700 low-fidelity DFT, and 29 400 high-fidelity DFT).

View Article and Find Full Text PDF

Researchers worldwide are seeking to repurpose existing drugs or discover new drugs to counter the disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). A promising source of candidates for such studies is molecules that have been reported in the scientific literature to be drug-like in the context of viral research. However, this literature is too large for human review and features unusual vocabularies for which existing named entity recognition (NER) models are ineffective.

View Article and Find Full Text PDF

The solvation properties of molecules, often estimated using quantum chemical simulations, are important in the synthesis of energy storage materials, drugs, and industrial chemicals. Here, we develop machine learning models of solvation energies to replace expensive quantum chemistry calculations with inexpensive-to-compute message-passing neural network models that require only the molecular graph as inputs. Our models are trained on a new database of solvation energies for 130,258 molecules taken from the QM9 dataset computed in five solvents (acetone, ethanol, acetonitrile, dimethyl sulfoxide, and water) via an implicit solvent model.

View Article and Find Full Text PDF

This letter announces the Virtual Excited State Reference for the Discovery of Electronic Materials Database (), the first database to include downloadable excited-state structures (S, S, T) and photophysical properties. is searchable, open-access via www.verdedb.

View Article and Find Full Text PDF