The bacterial diversity and corresponding biological significance revealed by high-throughput sequencing contribute massive information to source tracking of fecal contamination. The performances of classification models on predicting the fecal source of geographical local and foreign samples were examined herein, by applying support vector machine (SVM) algorithm. Random forest (RF) and Adaboost were applied for comparison as well. Discriminatory sequences were selected from Clostridiale, Bacteroidales, or Lactobacillales bacterial groups using extremely randomized trees (ExtraTrees). 1.51-12.64% of the unique sequences in the original library composed the representative markers, and they contributed 70% of the discrepancies between source microbiomes. The overall accuracy of the SVM model and the RF model on local samples was 96.08% and 98.04%, respectively, higher than that of the Adaboost (90.20%). As for the non-local samples, the SVM assigned most of the fecal samples into the correct category while several false-positive judgments occurred in closely related groups. The results in this paper suggested that the SVM was a time-saving and accurate method for fecal source tracking in contaminated water body with the potential capability of executing tasks based on geographically unassociated samples, and underlined the necessity of qPCR analysis for accurate detection of human source pollution.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jhazmat.2020.124821DOI Listing

Publication Analysis

Top Keywords

fecal contamination
8
geographically unassociated
8
unassociated samples
8
support vector
8
vector machine
8
source tracking
8
fecal source
8
samples
6
fecal
5
source
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!