Cross-modal hashing encodes different modalities of multimodal data into low-dimensional Hamming space for fast cross-modal retrieval. In multi-label cross-modal retrieval, multimodal data are often annotated with multiple labels, and some labels, e.g.", ocean" and "cloud", often co-occur. However, existing cross-modal hashing methods overlook label dependency that is crucial for improving performance. To fulfill this gap, this article proposes graph convolutional multi-label hashing (GCMLH) for effective multi-label cross-modal retrieval. Specifically, GCMLH first generates word embedding of each label and develops label encoder to learn highly correlated label embedding via graph convolutional network (GCN). In addition, GCMLH develops feature encoder for each modality, and feature fusion module to generate highly semantic feature via GCN. GCMLH uses teacher-student learning scheme to transfer knowledge from the teacher modules, i.e., label encoder and feature fusion module, to the student module, i.e., feature encoder, such that learned hash code can well exploit multi-label dependency and multimodal semantic structure. Extensive empirical results on several benchmarks demonstrate the superiority of the proposed method over existing state-of-the-arts.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TNNLS.2024.3421583 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!