In the past few decades, we have witnessed the success of bag-of-features (BoF) models in scene classification, object detection, and image segmentation. Whereas it is also well acknowledged that the limitation of BoF-based methods lies in the low-level feature encoding and coarse feature pooling. This paper proposes a novel scene classification method, which leverages several semantic codebooks learned in a multitask fashion for robust feature encoding, and designs a context-aware image representation for efficient feature pooling. Apart from conventional universal codebook learning approaches, the proposed method encodes each class of local features with a unique semantic codebook, which captures the distinct distribution of different semantic classes more effectively. Instead of learning each semantic codebook separately, we learn a compact global codebook, of which each semantic codebook is a sparse subset, with a two-stage iterative multitask learning algorithm. While minimizing the clustering divergence, the semantic codeword assignment is solved by submodular optimization simultaneously. Built upon the global and semantic codebooks, a context-aware image representation is further developed to encode both global and semantic features in image representation via contextual quantization, semantic response computation, and semantic pooling. Extensive experiments have been conducted to validate the effectiveness of the proposed method on various public benchmarks with several popular local features.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2016.2607424DOI Listing

Publication Analysis

Top Keywords

semantic codebooks
12
image representation
12
semantic codebook
12
semantic
11
multitask learning
8
codebooks context-aware
8
scene classification
8
feature encoding
8
feature pooling
8
context-aware image
8

Similar Publications

Satellite-ground communication is a critical component in the global communication system, significantly contributing to environmental monitoring, radio and television broadcasting, aerospace operations, and other domains. However, the technology encounters challenges in data transmission efficiency, due to the drastic alterations in the communication channel caused by the rapid movement of satellites. In comparison to traditional transmission methods, semantic communication (SemCom) technology enhances transmission efficiency by comprehending and leveraging the intrinsic meaning of information, making it ideal for image transmission in satellite communications.

View Article and Find Full Text PDF

Background: Nowadays, social media plays a crucial role in disseminating information about cancer prevention and treatment. A growing body of research has focused on assessing access and communication effects of cancer information on social media. However, there remains a limited understanding of the comprehensive presentation of cancer prevention and treatment methods across social media platforms.

View Article and Find Full Text PDF

We introduce PICFormer, a novel framework for Pluralistic Image Completion using a transFormer based architecture, that achieves both high quality and diversity at a much faster inference speed. Our key contribution is to introduce a code-shared codebook learning using a restrictive CNN on small and non-overlapping receptive fields (RFs) for the local visible token representation. This results in a compact yet expressive discrete representation, facilitating efficient modeling of global visible context relations by the transformer.

View Article and Find Full Text PDF

Learning shared template representation with augmented feature for multi-object pose estimation.

Neural Netw

August 2024

The Ministry of Education Key Laboratory of Precision Opto-Mechatronics Technology, School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing, 100191, China. Electronic address:

Template matching pose estimation methods based on deep learning have made significant advancements via metric learning or reconstruction learning. Existing approaches primarily build distinct template representation libraries (codebooks) from rendered images for each object, which complicate the training process and increase memory cost for multi-object tasks. Additionally, they struggle to effectively handle discrepancies between the distributions of training and test sets, particularly for occluded objects, resulting in suboptimal matching accuracy.

View Article and Find Full Text PDF

Background: Formulating a thoughtful problem representation (PR) is fundamental to sound clinical reasoning and an essential component of medical education. Aside from basic structural recommendations, little consensus exists on what characterizes high-quality PRs.

Objectives: To elucidate characteristics that distinguish PRs created by experts and novices.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!