In this study, a novel exploration method for centralized training and decentralized execution (CTDE)-based multi-agent reinforcement learning (MARL) is introduced. The method uses the concept of strangeness, which is determined by evaluating (1) the level of the unfamiliarity of the observations an agent encounters and (2) the level of the unfamiliarity of the entire state the agents visit. An exploration bonus, which is derived from the concept of strangeness, is combined with the extrinsic reward obtained from the environment to form a mixed reward, which is then used for training CTDE-based MARL algorithms.
View Article and Find Full Text PDFIntralogistics is a technology that optimizes, integrates, automates, and manages the logistics flow of goods within a logistics transportation and sortation center. As the demand for parcel transportation increases, many sortation systems have been developed. In general, the goal of sortation systems is to route (or sort) parcels correctly and quickly.
View Article and Find Full Text PDF