Cross-Modal Metric Learning for AUC Optimization.

IEEE Trans Neural Netw Learn Syst

Published: October 2018

Cross-modal metric learning (CML) deals with learning distance functions for cross-modal data matching. The existing methods mostly focus on minimizing a loss defined on sample pairs. However, the numbers of intraclass and interclass sample pairs can be highly imbalanced in many applications, and this can lead to deteriorating or unsatisfactory performances. The area under the receiver operating characteristic curve (AUC) is a more meaningful performance measure for the imbalanced distribution problem. To tackle the problem as well as to make samples from different modalities directly comparable, a CML method is presented by directly maximizing AUC. The method can be further extended to focus on optimizing partial AUC (pAUC), which is the AUC between two specific false positive rates (FPRs). This is particularly useful in certain applications where only the performances assessed within predefined false positive ranges are critical. The proposed method is formulated as a log-determinant regularized semidefinite optimization problem. For efficient optimization, a minibatch proximal point algorithm is developed. The algorithm is experimentally verified stable with the size of sampled pairs that form a minibatch at each iteration. Several data sets have been used in evaluation, including three cross-modal data sets on face recognition under various scenarios and a single modal data set, the Labeled Faces in the Wild. Results demonstrate the effectiveness of the proposed methods and marked improvements over the existing methods. Specifically, pAUC-optimized CML proves to be more competitive for performance measures such as Rank-1 and verification rate at FPR = 0.1%.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2017.2769128DOI Listing

Publication Analysis

Top Keywords

cross-modal metric
8
metric learning
8
cross-modal data
8
existing methods
8
sample pairs
8
false positive
8
data sets
8
auc
5
cross-modal
4
learning auc
4

Similar Publications

Self-critical strategy adjustment based artificial intelligence method in generating diagnostic reports of respiratory diseases.

Physiol Meas

January 2025

Academy of Military Science of the People's Liberation Army, Beijing, 100073, CHINA.

Objective: Humanity faces many health challenges, among which respiratory diseases are one of the leading causes of human death. Existing AI-driven pre-diagnosis approaches can enhance the efficiency of diagnosis but still face challenges. For example, single-modal data suffer from information redundancy or loss, difficulty in learning relationships between features, and revealing the obscure characteristics of complex diseases.

View Article and Find Full Text PDF

DICCR: Double-gated intervention and confounder causal reasoning for vision-language navigation.

Neural Netw

December 2024

School of Computer and Electronic Information, Guangxi University, University Road, Nanning, 530004, Guangxi, China. Electronic address:

Vision-language navigation (VLN) is a challenging task that requires agents to capture the correlation between different modalities from redundant information according to instructions, and then make sequential decisions on visual scenes and text instructions in the action space. Recent research has focused on extracting visual features and enhancing text knowledge, ignoring the potential bias in multi-modal data and the problem of spurious correlations between vision and text. Therefore, this paper studies the relationship structure between multi-modal data from the perspective of causality and weakens the potential correlation between different modalities through cross-modal causality reasoning.

View Article and Find Full Text PDF

Generating accurate and contextually rich captions for images and videos is essential for various applications, from assistive technology to content recommendation. However, challenges such as maintaining temporal coherence in videos, reducing noise in large-scale datasets, and enabling real-time captioning remain significant. We introduce MIRA-CAP (Memory-Integrated Retrieval-Augmented Captioning), a novel framework designed to address these issues through three core innovations: a cross-modal memory bank, adaptive dataset pruning, and a streaming decoder.

View Article and Find Full Text PDF

AutoFOX: An automated cross-modal 3D fusion framework of coronary X-ray angiography and OCT.

Med Image Anal

December 2024

School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200030, China; Department of Cardiovascular Medicine, University of Oxford, OX39DU, UK. Electronic address:

Article Synopsis
  • Coronary artery disease (CAD) is a major global health issue, and combining coronary X-ray angiography (XA) with optical coherence tomography (OCT) can enhance diagnosis and treatment by providing detailed images of coronary anatomy and plaque structure.
  • The new framework, AutoFOX, employs a deep learning model called TransCAN to accurately align 3D vascular images, achieving impressive alignment precision, especially at critical anatomical points.
  • AutoFOX also includes innovative methods for reconstructing side branches and utilizes a diverse dataset for validation, demonstrating strong accuracy and reliability in assessing bifurcation lesions, which is essential for improving CAD management and procedures.
View Article and Find Full Text PDF

Dual-modality visual feature flow for medical report generation.

Med Image Anal

December 2024

Chongqing Key Laboratory of Image Cognition, College of Computer Science and Technology, Chongqing University of Posts and Telecommunication, Chongqing, 400065, China.

Medical report generation, a cross-modal task of generating medical text information, aiming to provide professional descriptions of medical images in clinical language. Despite some methods have made progress, there are still some limitations, including insufficient focus on lesion areas, omission of internal edge features, and difficulty in aligning cross-modal data. To address these issues, we propose Dual-Modality Visual Feature Flow (DMVF) for medical report generation.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!