Publications by Haiwen Diao

Publications by authors named "Haiwen Diao"

Page 1 of 1

GSSF: Generalized Structural Sparse Function for Deep Cross-Modal Metric Learning.

Haiwen Diao Ying Zhang Shang Gao Jiawen Zhu Long Chen

IEEE Trans Image Process

November 2024

Cross-modal metric learning is a prominent research topic that bridges the semantic heterogeneity between vision and language. Existing methods frequently utilize simple cosine or complex distance metrics to transform the pairwise features into a similarity score, which suffers from an inadequate or inefficient capability for distance measurements. Consequently, we propose a Generalized Structural Sparse Function to dynamically capture thorough and powerful relationships across modalities for pair-wise similarity learning while remaining concise but efficient.

View Article and Find Full Text PDF

Deep Boosting Learning: A Brand-New Cooperative Approach for Image-Text Matching.

Haiwen Diao Ying Zhang Shang Gao Xiang Ruan Huchuan Lu

IEEE Trans Image Process

May 2024

Image-text matching remains a challenging task due to heterogeneous semantic diversity across modalities and insufficient distance separability within triplets. Different from previous approaches focusing on enhancing multi-modal representations or exploiting cross-modal correspondence for more accurate retrieval, in this paper we aim to leverage the knowledge transfer between peer branches in a boosting manner to seek a more powerful matching model. Specifically, we propose a brand-new Deep Boosting Learning (DBL) algorithm, where an anchor branch is first trained to provide insights into the data properties, with a target branch gaining more advanced knowledge to develop optimal features and distance metrics.

View Article and Find Full Text PDF

Plug-and-Play Regulators for Image-Text Matching.

Haiwen Diao Ying Zhang Wei Liu Xiang Ruan Huchuan Lu

IEEE Trans Image Process

April 2023

Exploiting fine-grained correspondence and visual-semantic alignments has shown great potential in image-text matching. Generally, recent approaches first employ a cross-modal attention unit to capture latent region-word interactions, and then integrate all the alignments to obtain the final similarity. However, most of them adopt one-time forward association or aggregation strategies with complex architectures or additional information, while ignoring the regulation ability of network feedback.

View Article and Find Full Text PDF