A key aspect of producing accurate and reliable machine learning models for the prediction of properties of quantum chemistry (QC) data is identifying possible data characteristics that may negatively influence model training. In previous work, we identified that molecules and materials with a low volume of the convex hull (VCH) of atomic positions may be harmful in model training and a source of prediction outliers. In this paper, we extend this analysis further and develop a biased sampling study to evaluate the influence of VCH on the training data of a model using different structures of molecules and materials. Our study confirms that VCH influences model training and shows the importance of using homogeneous geometric characteristics of structures when building new data sets or selecting training sets from larger QC data sets.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1021/acs.jcim.3c00242 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!