In this paper, we formally address universal object detection, which aims to detect every category in every scene. The dependence on human annotations, the limited visual information, and the novel categories in open world severely restrict the universality of detectors. We propose UniDetector, a universal object detector that recognizes enormous categories in the open world. The critical points for UniDetector are: 1) it leverages images of multiple sources and heterogeneous label spaces in training through image-text alignment, which guarantees sufficient information for universal representations. 2) it involves heterogeneous supervision training, which alleviates the dependence on the limited fully-labeled images. 3) it generalizes to open world easily while keeping the balance between seen and unseen classes. 4) it further promotes generalizing to novel categories through our proposed decoupling training manner and probability calibration. These contributions allow UniDetector to detect over 7k categories, the largest measurable size so far, with only about 500 classes participating in training. Our UniDetector behaves the strong zero-shot ability on large-vocabulary datasets - it surpasses supervised baselines by more than 5% without seeing any corresponding images. On 13 detection datasets with various scenes, UniDetector also achieves state-of-the-art performance with only a 3% amount of training data.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2024.3411595DOI Listing

Publication Analysis

Top Keywords

universal object
12
unidetector universal
8
object detection
8
heterogeneous supervision
8
novel categories
8
categories open
8
unidetector
6
training
5
detection heterogeneous
4
supervision paper
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!