In the past few decades, we have witnessed the success of bag-of-features (BoF) models in scene classification, object detection, and image segmentation. Whereas it is also well acknowledged that the limitation of BoF-based methods lies in the low-level feature encoding and coarse feature pooling. This paper proposes a novel scene classification method, which leverages several semantic codebooks learned in a multitask fashion for robust feature encoding, and designs a context-aware image representation for efficient feature pooling. Apart from conventional universal codebook learning approaches, the proposed method encodes each class of local features with a unique semantic codebook, which captures the distinct distribution of different semantic classes more effectively. Instead of learning each semantic codebook separately, we learn a compact global codebook, of which each semantic codebook is a sparse subset, with a two-stage iterative multitask learning algorithm. While minimizing the clustering divergence, the semantic codeword assignment is solved by submodular optimization simultaneously. Built upon the global and semantic codebooks, a context-aware image representation is further developed to encode both global and semantic features in image representation via contextual quantization, semantic response computation, and semantic pooling. Extensive experiments have been conducted to validate the effectiveness of the proposed method on various public benchmarks with several popular local features.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TIP.2016.2607424 | DOI Listing |
Sensors (Basel)
January 2025
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China.
Satellite-ground communication is a critical component in the global communication system, significantly contributing to environmental monitoring, radio and television broadcasting, aerospace operations, and other domains. However, the technology encounters challenges in data transmission efficiency, due to the drastic alterations in the communication channel caused by the rapid movement of satellites. In comparison to traditional transmission methods, semantic communication (SemCom) technology enhances transmission efficiency by comprehending and leveraging the intrinsic meaning of information, making it ideal for image transmission in satellite communications.
View Article and Find Full Text PDFJ Med Internet Res
August 2024
School of Software, Shanghai Jiao Tong University, Shanghai, China.
Background: Nowadays, social media plays a crucial role in disseminating information about cancer prevention and treatment. A growing body of research has focused on assessing access and communication effects of cancer information on social media. However, there remains a limited understanding of the comprehensive presentation of cancer prevention and treatment methods across social media platforms.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
December 2024
We introduce PICFormer, a novel framework for Pluralistic Image Completion using a transFormer based architecture, that achieves both high quality and diversity at a much faster inference speed. Our key contribution is to introduce a code-shared codebook learning using a restrictive CNN on small and non-overlapping receptive fields (RFs) for the local visible token representation. This results in a compact yet expressive discrete representation, facilitating efficient modeling of global visible context relations by the transformer.
View Article and Find Full Text PDFNeural Netw
August 2024
The Ministry of Education Key Laboratory of Precision Opto-Mechatronics Technology, School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing, 100191, China. Electronic address:
Template matching pose estimation methods based on deep learning have made significant advancements via metric learning or reconstruction learning. Existing approaches primarily build distinct template representation libraries (codebooks) from rendered images for each object, which complicate the training process and increase memory cost for multi-object tasks. Additionally, they struggle to effectively handle discrepancies between the distributions of training and test sets, particularly for occluded objects, resulting in suboptimal matching accuracy.
View Article and Find Full Text PDFJ Hosp Med
June 2024
Division of General Internal Medicine, Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, USA.
Background: Formulating a thoughtful problem representation (PR) is fundamental to sound clinical reasoning and an essential component of medical education. Aside from basic structural recommendations, little consensus exists on what characterizes high-quality PRs.
Objectives: To elucidate characteristics that distinguish PRs created by experts and novices.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!