Since the EU banned animal testing for cosmetic products and ingredients in 2013, many defined approaches (DA) for skin sensitization assessment have been developed. Machine learning models were shown to be effective in DAs, but the predictivity might be affected by data imbalance (i.e. more numbers of sensitizers than non-sensitizers) and limited information in the databases. To improve the predictivity of DAs, here we attempted to apply data-rebalancing ensemble learning (bagging with support vector machine (SVM)) and a novel and comprehensive Cosmetics Europe database. For predicting human hazard and three-class potency, 12 models were built for each using a training set of 96 substances and a test set of 32 substances from the database. The model with the highest accuracy for predicting hazard (90.63% for the test set and 88.54% for the training set, named hazard-DA) used the SVM-bagging with combinations of all variables (V6), while the model with the highest accuracy for predicting potency (68.75% for the test set and 82.29% for the training set, named potency-DA) used SVM alone. Both DAs showed higher performance than LLNA and other machine-learning-based DAs, and the potency-DA could provide more in-depth assessment. Those findings indicated that SVM-bagging-based DAs provided enhanced predictivity for hazard assessment by further data rebalancing. Meanwhile, the effect of imbalanced data might be offset by more detailed categorization of sensitizers for potency assessment, thus SVM-based DA without bagging could provide sufficient predictivity. The improved DAs in this study could be promising tools for skin sensitization assessment without animal testing.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.14573/altex.1809191 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!