How can animals learn the prey densities available in an environment that changes unpredictably from day to day, and how much effort should they devote to doing so, rather than exploiting what they already know? Using a two-armed bandit situation, we simulated several processes that might explain the trade-off between exploring and exploiting. They included an optimising model, dynamic backward sampling; a dynamic version of the matching law; the Rescorla-Wagner model; a neural network model; and ɛ-greedy and rule of thumb models derived from the study of reinforcement learning in artificial intelligence. Under conditions like those used in published studies of birds' performance under two-armed bandit conditions, all models usually identified the more profitable source of reward, and did so more quickly when the reward probability differential was greater.
View Article and Find Full Text PDF