The QM9 dataset has become the golden standard for Machine Learning (ML) predictions of various chemical properties. QM9 is based on the GDB, which is a combinatorial exploration of the chemical space. ML molecular predictions have been recently published with an accuracy on par with Density Functional Theory calculations. Such ML models need to be tested and generalized on real data. PC9, a new QM9 equivalent dataset (only H, C, N, O and F and up to 9 "heavy" atoms) of the PubChemQC project is presented in this article. A statistical study of bonding distances and chemical functions shows that this new dataset encompasses more chemical diversity. Kernel Ridge Regression, Elastic Net and the Neural Network model provided by SchNet have been used on both datasets. The overall accuracy in energy prediction is higher for the QM9 subset. However, a model trained on PC9 shows a stronger ability to predict energies of the other dataset.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6852905PMC
http://dx.doi.org/10.1186/s13321-019-0391-2DOI Listing

Publication Analysis

Top Keywords

chemical diversity
8
machine learning
8
learning predictions
8
dataset's chemical
4
diversity limits
4
limits generalizability
4
generalizability machine
4
qm9
4
predictions qm9
4
dataset
4

Similar Publications

Single-crystal Au(111), renowned for its chemically inert surface, long-range "herringbone" reconstruction, and high electrical conductivity, has long served as an exemplary template in diverse fields, , crystal epitaxy, electronics, and electrocatalysis. However, commercial Au(111) products are high-priced and limited to centimeter sizes, largely restricting their broad applications. Herein, a low-cost, high-reproducible method is developed to produce 4 in.

View Article and Find Full Text PDF

Precision Metal Nanoclusters Meet Proteins: Crafting Next-Gen Hybrid Materials.

ACS Nano

January 2025

Department of Chemical and Biomolecular Engineering, National University of Singapore, 4 Engineering Drive 4, Singapore 117585, Singapore.

Metal nanoclusters (NCs), owing to their atomic precision and unique molecule-like properties, have gained widespread attention for applications ranging from catalysis to bioimaging. In recent years, proteins, with their hierarchical structures and diverse functionalities, have emerged as good candidates for functionalizing metal NCs, rendering metal NC-protein conjugates with combined and even synergistically enhanced properties featured by both components. In this Perspective, we explore key questions regarding why proteins serve as complementary partners for metal NCs, the methodologies available for conjugating proteins with metal NCs, and the characterization techniques necessary to elucidate the structures and interactions within this emerging bionano system.

View Article and Find Full Text PDF

Chemical modification of naturally derived glycosaminoglycans (GAGs) expands their potential utility for applications in soft tissue repair and regenerative medicine. Here we report the preparation of a novel crosslinked chondroitin sulfate (~200 to 2000 kilodaltons) that is both soluble in aqueous solution and microfilterable. We refer to these materials as "SuperGAGs.

View Article and Find Full Text PDF

Previous research indicates that the COVID-19 pandemic catalyzed alterations in behaviors that may impact exposures to environmental endocrine-disrupting chemicals. This includes changes in the use of chemicals found in consumer products, food packaging, and exposure to air pollutants. Within the Environmental influences on Child Health Outcomes (ECHO) program, a national consortium initiated to understand the effects of environmental exposures on child health and development, our objective was to assess whether urinary concentrations of a wide range of potential endocrine-disrupting chemicals varied before and during the pandemic.

View Article and Find Full Text PDF

The ligninolytic catalytic network reveals the importance of auxiliary enzymes in lignin biocatalysts.

Proc Natl Acad Sci U S A

January 2025

Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, People's Republic of China.

Lignin degradation by biocatalysts is a key strategy to develop a plant-based sustainable carbon economy and thus alleviate global climate change. This process involves synergy between ligninases and auxiliary enzymes. However, auxiliary enzymes within secretomes, which are composed of thousands of enzymes, remain enigmatic, although several ligninolytic enzymes have been well characterized.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!