Multi-label zero-shot learning (ML-ZSL) strives to recognize all objects in an image, regardless of whether they are present in the training data. Recent methods incorporate an attention mechanism to locate labels in the image and generate class-specific semantic information. However, the attention mechanism built on visual features treats label embeddings equally in the prediction score, leading to severe semantic ambiguity. This study focuses on efficiently utilizing semantic information in the attention mechanism. We propose a contrastive label-based attention method (CLA) to associate each label with the most relevant image regions. Specifically, our label-based attention, guided by the latent label embedding, captures discriminative image details. To distinguish region-wise correlations, we implement a region-level contrastive loss. In addition, we utilize a global feature alignment module to identify labels with general information. Extensive experiments on two benchmarks, NUS-WIDE and Open Images, demonstrate that our CLA outperforms the state-of-the-art methods. Especially under the ZSL setting, our method achieves 2.0% improvements in mean Average Precision (mAP) for NUS-WIDE and 4.0% for Open Images compared with recent methods.

Download full-text PDF

Source
http://dx.doi.org/10.1142/S0129065725500108DOI Listing

Publication Analysis

Top Keywords

label-based attention
12
attention mechanism
12
multi-label zero-shot
8
zero-shot learning
8
contrastive label-based
8
semantic attention
8
open images
8
attention
6
learning contrastive
4
attention multi-label
4

Similar Publications

Multi-label zero-shot learning (ML-ZSL) strives to recognize all objects in an image, regardless of whether they are present in the training data. Recent methods incorporate an attention mechanism to locate labels in the image and generate class-specific semantic information. However, the attention mechanism built on visual features treats label embeddings equally in the prediction score, leading to severe semantic ambiguity.

View Article and Find Full Text PDF

Vision transformer has demonstrated great potential in abundant vision tasks. However, it also inevitably suffers from poor generalization capability when the distribution shift occurs in testing (i.e.

View Article and Find Full Text PDF

Recently, discarded electronic products have caused serious environmental pollution and information security issues, which have attracted widespread attention. Here, a degradable tribotronic transistor (DTT) for self-destructing intelligent package e-labels has been developed, integrated by a triboelectric nanogenerator and a protonic field-effect transistor with sodium alginate as a dielectric layer. The triboelectric potential generated by external contact electrification is used as the gate voltage of the organic field-effect transistor, which regulates carrier transport through proton migration/accumulation.

View Article and Find Full Text PDF

Background: Med-Index is a one-health front-of-pack (FOP) label, based on Mediterranean diet (MedDiet) principles, developed to summarize information about the nutritional properties and related-health benefits of any food as well as its sustainable production processes, and the associated food company's social responsibility parameters in a new "Planeterranean" perspective. Thus, Med-Index can be adopted in and by any European region and authority as well as worldwide; this is achieved by consumption and cooking of locally available and sourced foods that respect MedDiet principles, both in terms of healthy nutrition and sustainable production. The huge body of scientific evidence about the health benefits of the MedDiet model and principles requires a comprehensive framework to encompass the scientific reliability and robustness of this tool.

View Article and Find Full Text PDF

Hierarchical Fusion Network with Enhanced Knowledge and Contrastive Learning for Multimodal Aspect-Based Sentiment Analysis on Social Media.

Sensors (Basel)

August 2023

Department of Computer Science, School of Computing, Tokyo Institute of Technology, 4259 Nagatsuta, Midori-ku, Yokohama-shi 226-8502, Japan.

Aspect-based sentiment analysis (ABSA) is a task of fine-grained sentiment analysis that aims to determine the sentiment of a given target. With the increased prevalence of smart devices and social media, diverse data modalities have become more abundant. This fuels interest in multimodal ABSA (MABSA).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!