AI Article Synopsis

Article Abstract

Weakly supervised object localization (WSOL), which trains object localization models using solely image category annotations, remains a challenging problem. Existing approaches based on convolutional neural networks (CNNs) tend to miss full object extent while activating discriminative object parts. Based on our analysis, this is caused by CNN's intrinsic characteristics, which experiences difficulty to capture object semantics at long distances. In this article, we introduce the vision transformer to WSOL, with the aim to capture long-range semantic dependency of features by leveraging transformer's cascaded self-attention mechanism. We propose the token semantic coupled attention map (TS-CAM) method, which first decomposes class-aware semantics and then couples the semantics with attention maps for semantic-aware activation. To capture object semantics at long distances and avoid partial activation, TS-CAM performs spatial embedding by partitioning an image to a set of patch tokens. To incorporate object category information to patch tokens, TS-CAM reallocates category-related semantics to each patch token. The patch tokens are finally coupled with attention maps which are semantic-agnostic to perform semantic-aware object localization. By introducing semantic tokens to produce semantic-aware attention maps, we further explore the capability of TS-CAM for multicategory object localization. Experiments show that TS-CAM outperforms its CNN-CAM counterpart by 11.6% and 28.9% on ILSVRC and CUB-200-2011 datasets, respectively, improving the state-of-the-art with large margins. TS-CAM also demonstrates superiority for multicategory object localization on the Pascal VOC dataset. The code is available at github.com/yuanyao366/ts-cam-extension.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2022.3218471DOI Listing

Publication Analysis

Top Keywords

object localization
24
coupled attention
12
attention maps
12
patch tokens
12
object
11
token semantic
8
semantic coupled
8
attention map
8
weakly supervised
8
supervised object
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!