Background: To address the need for easy and reliable species classification in plant genetic resources collections, we assessed the potential of five classifiers (Random Forest, Neighbour-Joining, 1-Nearest Neighbour, a conservative variety of 3-Nearest Neighbours and Naive Bayes) We investigated the effects of the number of accessions per species and misclassification rate on classification success, and validated theirs generic value results with three complete datasets.
Results: We found the conservative variety of 3-Nearest Neighbours to be the most reliable classifier when varying species representation and misclassification rate. Through the analysis of the three complete datasets, this finding showed generic value.