Weakly-supervised object localization (WSOL) has gained popularity over the last years for its promise to train localization models with only image-level labels. Since the seminal WSOL work of class activation mapping (CAM), the field has focused on how to expand the attention regions to cover objects more broadly and localize them better. However, these strategies rely on full localization supervision for validating hyperparameters and model selection, which is in principle prohibited under the WSOL setup. In this paper, we argue that WSOL task is ill-posed with only image-level labels, and propose a new evaluation protocol where full supervision is limited to only a small held-out set not overlapping with the test set. We observe that, under our protocol, the five most recent WSOL methods have not made a major improvement over the CAM baseline. Moreover, we report that existing WSOL methods have not reached the few-shot learning baseline, where the full-supervision at validation time is used for model training instead. Based on our findings, we discuss some future directions for WSOL. Source code and dataset are available at https://github.com/clovaai/wsolevaluation https://github.com/clovaai/wsolevaluation.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2022.3169881DOI Listing

Publication Analysis

Top Keywords

object localization
8
image-level labels
8
wsol methods
8
wsol
7
evaluation weakly
4
weakly supervised
4
supervised object
4
localization
4
localization protocol
4
protocol metrics
4

Similar Publications

The goal of the present investigation was to perform a registered replication of Jones and Macken's (1995b) study, which showed that the segregation of a sequence of sounds to distinct locations reduced the disruptive effect on serial recall. Thereby, it postulated an intriguing connection between auditory stream segregation and the cognitive mechanisms underlying the irrelevant speech effect. Specifically, it was found that a sequence of changing utterances was less disruptive in stereophonic presentation, allowing each auditory object (letters) to be allocated to a unique location (right ear, left ear, center), compared to when the same sounds were played monophonically.

View Article and Find Full Text PDF

The spin angular momentum (SAM) plays a significant role in light-matter interactions. It is well known that light carrying SAM can exert optical torques on micro-objects and drive rotations, but 3D rotation around an arbitrary axis remains challenging. Here, we demonstrate full control of the 3D optical torque acting on a trapped microparticle by tailoring the vectorial SAM transfer.

View Article and Find Full Text PDF

Accurate 6D object pose estimation is critical for autonomous docking. To address the inefficiencies and inaccuracies associated with maximal cliques-based pose estimation methods, we propose a fast 6D pose estimation algorithm that integrates feature space and space compatibility constraints. The algorithm reduces the graph size by employing Laplacian filtering to resample high-frequency signal nodes.

View Article and Find Full Text PDF

Unsupervised Domain Adaptation for Object Detection (UDA-OD) aims to adapt a model trained on a labeled source domain to an unlabeled target domain, addressing challenges posed by domain shifts. However, existing methods often face significant challenges, particularly in detecting small objects and over-relying on classification confidence for pseudo-label selection, which often leads to inaccurate bounding box localization. To address these issues, we propose a novel UDA-OD framework that leverages scale consistency (SC) and Temporal Ensemble Pseudo-Label Selection (TEPLS) to enhance cross-domain robustness and detection performance.

View Article and Find Full Text PDF

Gripping Success Metric for Robotic Fruit Harvesting.

Sensors (Basel)

December 2024

Department of Computer Science & Artificial Intelligence, Jeonbuk National University, Jeonju-si 54896, Republic of Korea.

Recently, computer vision methods have been widely applied to agricultural tasks, such as robotic harvesting. In particular, fruit harvesting robots often rely on object detection or segmentation to identify and localize target fruits. During the model selection process for object detection, the average precision (AP) score typically provides the de facto standard.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!