Weakly supervised object localization (WSOL) is a challenging and promising task that aims to localize objects solely based on the supervision of image category labels. In the absence of annotated bounding boxes, WSOL methods must employ the intrinsic properties of the image classification task pipeline to generate object localizations. In this work, we propose a WSOL method for exploring the Intrinsic Discrimination and Consistency in the image classification task pipeline, and call it as IDC. First, we develop a Triplet Metrics Based Foreground Modeling (TMFM) framework to directly predict object foreground regions using intrinsic discrimination. Unlike Class Activation Map (CAM) based methods that also rely on intrinsic discrimination, our TMFM framework alleviates the problem of only focusing on the most discriminative parts by optimizing foreground and background regions synergistically. Second, we design a Dual Geometric Transformation Consistency Constraints (DGTC2) training strategy to introduce additional supervision and regularization constraints for WSOL by leveraging intrinsic geometric transformation consistency. The proposed pixel-wise and object-wise consistency constraint losses cost-effectively provide spontaneous supervision for WSOL. Extensive experiments show that our IDC method achieves significant and consistent performance gains compared to existing state-of-the-art WSOL approaches. Code is available at: https://github.com/vignywang/IDC.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TIP.2024.3356174 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!