AI Article Synopsis

  • Generalized zero-shot learning (GZSL) aims to recognize unseen categories by leveraging knowledge from seen categories, but faces challenges due to mismatches between visual features and semantic attributes.
  • The issues arise mainly from attribute diversity (differences in the specificity of attributes) and instance diversity (variations in visual examples that share the same attributes), leading to ambiguity in visual representation.
  • To address these challenges, a new network called PSVMA+ is introduced, which adapts visual and semantic elements at multiple levels of granularity and improves feature learning by integrating different levels effectively, resulting in better performance in identifying unseen categories.

Article Abstract

Generalized zero-shot learning (GZSL) endeavors to identify the unseen categories using knowledge from the seen domain, necessitating the intrinsic interactions between the visual features and attribute semantic features. However, GZSL suffers from insufficient visual-semantic correspondences due to the attribute diversity and instance diversity. Attribute diversity refers to varying semantic granularity in attribute descriptions, ranging from low-level (specific, directly observable) to high-level (abstract, highly generic) characteristics. This diversity challenges the collection of adequate visual cues for attributes under a uni-granularity. Additionally, diverse visual instances corresponding to the same sharing attributes introduce semantic ambiguity, leading to vague visual patterns. To tackle these problems, we propose a multi-granularity progressive semantic-visual mutual adaption (PSVMA+) network, where sufficient visual elements across granularity levels can be gathered to remedy the granularity inconsistency. PSVMA+ explores semantic-visual interactions at different granularity levels, enabling awareness of multi-granularity in both visual and semantic elements. At each granularity level, the dual semantic-visual transformer module (DSVTM) recasts the sharing attributes into instance-centric attributes and aggregates the semantic-related visual regions, thereby learning unambiguous visual features to accommodate various instances. Given the diverse contributions of different granularities, PSVMA+ employs selective cross-granularity learning to leverage knowledge from reliable granularities and adaptively fuses multi-granularity features for comprehensive representations. Experimental results demonstrate that PSVMA+ consistently outperforms state-of-the-art methods.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2024.3467229DOI Listing

Publication Analysis

Top Keywords

generalized zero-shot
8
zero-shot learning
8
visual
8
visual features
8
attribute diversity
8
sharing attributes
8
elements granularity
8
granularity levels
8
psvma+
5
granularity
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!