The choice of spectral similarity algorithms influences suspected soil sample provenance.

Forensic Sci Int

State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing 210008, PR China; University of the Chinese Academy of Sciences, Beijing 100049, PR China. Electronic address:

Published: June 2023

Similarity algorithms are commonly used in soil forensic applications to help identify similar samples from an existing reference library as possible source locations of unknown target samples. These algorithms are well-suited to compare soil spectra. However, different similarity algorithms may lead to different clusters of similar samples, and thus different strengths of evidence in forensic investigations. To quantify this, we conducted a study to evaluate the influence of seven similarity algorithms on soil provenance, using as a sample set a soil spectral library consisting of 280 soil profiles from Anhui Province, China. This library includes three spatial scales of datasets: provincial (DS), county (DS) and field (DS). A set of ten samples covering a wide range of spectra variations were selected from the DS dataset as the "unknown" samples, with the remaining being used as the reference samples. This study aimed to: (1) evaluate how several commonly-used similarity algorithms, namely Euclidean distance (ED), Mahalanobis distance (MD), Spectral angle mapper (SAM), and Spectral information divergence (SID), as well as variants of several of these measured in standardized principal component space computed from the spectra (ED_PCA, MD_PCA and SAM_PCA), influence the identification of the matched similar samples; (2) determine the overlap in sample selection between different similarity algorithms; (3) propose best practices for similarity algorithms applied to soil forensic analysis using spectroscopy. The use of different similarity algorithms did influence the selection of most similar samples. The similarity algorithms calculated in PC space (ED_PCA, MD_PCA and SAM_PCA) performed slightly better than their counterparts calculated in spectral space. Due to the availability of a detailed spectral library, regardless of the different similarity algorithms used, the matched most similar samples were all located close to the unknowns, mostly within 3 km, with one exception. That is, the varied choices of different similarity algorithms hardly influenced the conclusion of soil provenance in this case. In general, MD_PCA, SAM and ED were the best similarity algorithms overall. However, since there was no single best algorithms for all cases, we recommend the joint use of MD_PCA, SAM and ED as an ensemble. Indications of possible sample provenance from these similarity measured can be useful evidence to complement evidence from other methods in a forensic investigation.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.forsciint.2023.111688DOI Listing

Publication Analysis

Top Keywords

similarity algorithms
48
algorithms
14
similarity
13
samples
9
soil
8
sample provenance
8
provenance similarity
8
soil forensic
8
soil provenance
8
spectral library
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!