Extensive efforts to gather materials data have largely overlooked potential data redundancy. In this study, we present evidence of a significant degree of redundancy across multiple large datasets for various material properties, by revealing that up to 95% of data can be safely removed from machine learning training with little impact on in-distribution prediction performance. The redundant data is related to over-represented material types and does not mitigate the severe performance degradation on out-of-distribution samples.
View Article and Find Full Text PDFComputer vision techniques have immense potential for materials design applications. In this work, we introduce an integrated and general-purpose AtomVision library that can be used to generate and curate microscopy image (such as scanning tunneling microscopy and scanning transmission electron microscopy) data sets and apply a variety of machine learning techniques. To demonstrate the applicability of this library, we (1) establish an atomistic image data set of about 10 000 materials with large structural and chemical diversity, (2) develop and compare convolutional and atomistic line graph neural network models to classify the Bravais lattices, (3) demonstrate the application of fully convolutional neural networks using U-Net architecture to pixelwise classify atom versus background, (4) use a generative adversarial network for super resolution, (5) curate an image data set on the basis of natural language processing using an open-access arXiv data set, and (6) integrate the computational framework with experimental microscopy images for Rh, FeO, and SnS systems.
View Article and Find Full Text PDFThe application of machine learning to the materials domain has traditionally struggled with two major challenges: a lack of large, curated data sets and the need to understand the physics behind the machine-learning prediction. The former problem is particularly acute in the polymers domain. Here we aim to simultaneously tackle these challenges through the incorporation of scientific knowledge, thus, providing improved predictions for smaller data sets, both under interpolation and extrapolation, and a degree of explainability.
View Article and Find Full Text PDFUncertainty quantification in artificial intelligence (AI)-based predictions of material properties is of immense importance for the success and reliability of AI applications in materials science. While confidence intervals are commonly reported for machine learning (ML) models, prediction intervals, i.e.
View Article and Find Full Text PDF