Model performance can be further improved with the extra guidance apart from the one-hot ground truth. To achieve it, recently proposed recollection-based methods utilize the valuable information contained in the past training history and derive a "recollection" from it to provide data-driven prior to guide the training. In this article, we focus on two fundamental aspects of this method, i.e., recollection construction and recollection utilization. Specifically, to meet the various demands of models with different capacities and at different training periods, we propose to construct a set of recollections with diverse distributions from the same training history. After that, all the recollections collaborate together to provide guidance, which is adaptive to different model capacities, as well as different training periods, according to our similarity-based elastic knowledge distillation (KD) algorithm. Without any external prior to guide the training, our method achieves a significant performance gain and outperforms the methods of the same category, even as well as KD with well-trained teacher. Extensive experiments and further analysis are conducted to demonstrate the effectiveness of our method.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2021.3107317DOI Listing

Publication Analysis

Top Keywords

elastic knowledge
8
knowledge distillation
8
training history
8
prior guide
8
guide training
8
training periods
8
training
6
distillation learning
4
learning recollection
4
recollection model
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!