Three-dimensional spatial prediction of Zn in the soil of a former tire manufacturing plant using machine learning and readily attainable multisource auxiliary data.

Environ Pollut

State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing, 210008, China; University of Chinese Academy of Sciences, Beijing, 100049, China. Electronic address:

Published: February 2023

Pollutants in the soil of industrial site are often highly heterogeneously distributed, which brought a challenge to accurately predict their three-dimensional (3D) spatial distributions. Here we attempt to create effective 3D prediction models using machine learning (ML) and readily attainable multisource auxiliary data for improving the prediction accuracy of highly heterogeneous Zn in the soil of a small-size industrial site. Using raw covariates from functional area layout, stratigraphic succession, and electrical resistivity tomography, and derived covariates of the raw covariates as predictors, we created 6 individual and 2 ensemble models for Zn, based on ML algorithms such as k-nearest neighbors, random forest, and extreme gradient boosting, and the stacking approach in ensemble ML. Results showed that the overall 3D spatial patterns of Zn predicted by individual and ensemble ML models, inverse distance weighting (IDW), and ordinary Kriging (OK) were similar, but their predictive performances differed significantly. The ensemble model with raw and derived covariates had the highest accuracy in representing the complex 3D spatial patterns of Zn (R = 0.45, RMSE = 344.80 mg kg), compared to the accuracies of individual ML models (R = 0.27-0.44, RMSE = 396.75-348.56 mg kg), OK (R = 0.33, RMSE = 381.12 mg kg), and IDW interpolation (R = 0.25, RMSE = 402.94 mg kg). Besides, the prediction accuracy gains of incorporating derived covariates were higher than adopting ensemble ML instead of single ML algorithm. These results highlighted the importance of developing derived covariates whilst adopting ML in predicting the 3D distribution of highly heterogeneous pollutant in the soil of small-size industrial site.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.envpol.2022.120931DOI Listing

Publication Analysis

Top Keywords

derived covariates
16
industrial site
12
three-dimensional spatial
8
machine learning
8
learning attainable
8
attainable multisource
8
multisource auxiliary
8
auxiliary data
8
prediction accuracy
8
highly heterogeneous
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!