In zero-shot learning (ZSL), attribute knowledge plays a vital role in transferring knowledge from seen classes to unseen classes. However, most existing ZSL methods learn biased attribute knowledge, which usually results in biased attribute prediction and a decline in zero-shot recognition performance. To solve this problem and learn unbiased attribute knowledge, we propose a visual attribute Transformer for zero-shot recognition (ZS-VAT), which is an effective and interpretable Transformer designed specifically for ZSL. In ZS-VAT, we design an attribute-head self-attention (AHSA) that is capable of learning unbiased attribute knowledge. Specifically, each attribute head in AHSA first transforms the local features into attribute-reinforced features and then accumulates the attribute knowledge from all corresponding reinforced features, reducing the mutual influence between attributes and avoiding information loss. AHSA finally preserves unbiased attribute knowledge through attribute embeddings. We also propose an attribute fusion model (AFM) that learns to recover the correct category knowledge from the attribute knowledge. In particular, AFM takes all features from AHSA as input and generates global embeddings. We carried out experiments to demonstrate that the attribute knowledge from AHSA and the category knowledge from AFM are able to assist each other. During the final semantic prediction, we combine the attribute embedding prediction (AEP) and global embedding prediction (GEP). We evaluated the proposed scheme on three benchmark datasets. ZS-VAT outperformed the state-of-the-art generalized ZSL (GZSL) methods on two datasets and achieved competitive results on the other dataset.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TNNLS.2024.3386935 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!