Nitroaromatic compounds (NACs) represent a significant source of organic pollutants in the environment. In this study, a well-rounded dataset containing 371 NACs with rat oral median lethal doses (LD) was developed. Based on the dataset, binary and multiple classification models were established. Seven machine learning algorithms were used to establish the prediction models in combination with six fingerprints. In the binary classification models, the overall predictive accuracy of 10-fold cross-validation for training set in the top ten models ranged from 0.823 to 0.874. In the multiple classification models, the combination of graph fingerprint and random forest (Graph-RF) yielded the best predictive effects with AUC values of 0.929 and 0.956 for the training set and the test set, respectively. Model prediction performance was further evaluated using the true external set comprising 1366 NACs, including 96.6% belonging to the applicability domain. Further, we determined the structural features influencing the acute oral toxicity based on information gain and substructure frequency analysis. Finally, we identified highly toxic compounds based on the structural alerts and successfully transformed a representative highly toxic compound into low-toxic alternatives via structural modification. Overall, the models constructed facilitate environmental risk assessment and the design of green and safe chemicals.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.fct.2022.113461 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!