Audio-visual video recognition (AVVR) integrates audio and visual cues to accurately categorize videos. While current methods using provided datasets achieve satisfactory results, they face challenges in retaining historical class knowledge when new classes appear in real-world situations. There are no dedicated methods to address this issue, prompting this paper to explore Class Incremental Audio-Visual Video Recognition (CIAVVR). CIAVVR aims to preserve historical knowledge contained in stored data and learned models to prevent catastrophic forgetting. Audio-visual data and models inherently have hierarchical structures, where the model contains both low-level and high-level semantic information, and data includes snippet-level, video-level, and distribution-level spatial information. It is crucial to fully exploit these hierarchical structures for data knowledge preservation and model knowledge preservation. However, existing image class incremental learning methods do not explicitly consider these hierarchical structures. Therefore, we introduce Hierarchical Augmentation and Distillation (HAD), which includes the Hierarchical Augmentation Module (HAM) and Hierarchical Distillation Module (HDM). These modules efficiently utilize the hierarchical structure of data and models. Specifically, HAM uses a novel augmentation strategy, segmental feature augmentation, to preserve hierarchical model knowledge. Simultaneously, HDM employs newly designed hierarchical logical distillation (video-distribution) and hierarchical correlative distillation (snippet-video) to maintain intra-sample and inter-sample hierarchical knowledge. Evaluations on four benchmarks (AVE, AVK-100, AVK-200, and AVK-400) show that HAD effectively captures hierarchical information, enhancing the preservation of historical class knowledge and performance. We also provide a theoretical analysis to support the segmental feature augmentation strategy.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2024.3387946DOI Listing

Publication Analysis

Top Keywords

hierarchical
13
hierarchical augmentation
12
class incremental
12
audio-visual video
12
video recognition
12
hierarchical structures
12
augmentation distillation
8
incremental audio-visual
8
historical class
8
class knowledge
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!