Cross-modal hashing encodes different modalities of multimodal data into low-dimensional Hamming space for fast cross-modal retrieval. In multi-label cross-modal retrieval, multimodal data are often annotated with multiple labels, and some labels, e.g.", ocean" and "cloud", often co-occur. However, existing cross-modal hashing methods overlook label dependency that is crucial for improving performance. To fulfill this gap, this article proposes graph convolutional multi-label hashing (GCMLH) for effective multi-label cross-modal retrieval. Specifically, GCMLH first generates word embedding of each label and develops label encoder to learn highly correlated label embedding via graph convolutional network (GCN). In addition, GCMLH develops feature encoder for each modality, and feature fusion module to generate highly semantic feature via GCN. GCMLH uses teacher-student learning scheme to transfer knowledge from the teacher modules, i.e., label encoder and feature fusion module, to the student module, i.e., feature encoder, such that learned hash code can well exploit multi-label dependency and multimodal semantic structure. Extensive empirical results on several benchmarks demonstrate the superiority of the proposed method over existing state-of-the-arts.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2024.3421583DOI Listing

Publication Analysis

Top Keywords

cross-modal retrieval
16
graph convolutional
12
convolutional multi-label
8
multi-label hashing
8
cross-modal hashing
8
multimodal data
8
multi-label cross-modal
8
label encoder
8
feature encoder
8
feature fusion
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!