Obtaining training data for constructing artificial neural networks (ANNs) to identify microbiological taxa is not always easy. Often, only small data sets with different numbers of observations per taxon are available. Here, the effect of both size of the training data set and of an imbalanced number of training patterns for different taxa is investigated using radial basis function ANNs to identify up to 60 species of marine microalgae. The best networks trained to discriminate 20, 40 and 60 species respectively gave overall percentage correct identification of 92, 84 and 77%. From 100 to 200 patterns per species was sufficient in networks trained to discriminate 20, 40 or 60 species. For 40 and 60 species data sets an imbalance in the number of training patterns per species always affected training success, the greater the imbalance the greater the effect. However, this could be largely compensated for by adjusting the networks using a posteriori probabilities, estimated as network output values.

Download full-text PDF

Source
http://dx.doi.org/10.1016/s0167-7012(00)00202-5DOI Listing

Publication Analysis

Top Keywords

training
8
radial basis
8
basis function
8
neural networks
8
training data
8
anns identify
8
data sets
8
number training
8
training patterns
8
networks trained
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!