MulFS-CAP: Multimodal Fusion-supervised Cross-modality Alignment Perception for Unregistered Infrared-visible Image Fusion.

Huafeng Li Zengyi Yang Yafei Zhang Wei Jia Zhengtao Yu Yu Liu

IEEE Trans Pattern Anal Mach Intell

Published: January 2025

In this study, we propose Multimodal Fusion-supervised Cross-modality Alignment Perception (MulFS-CAP), a novel framework for single-stage fusion of unregistered infrared-visible images. Traditional two-stage methods depend on explicit registration algorithms to align source images spatially, often adding complexity. In contrast, MulFS-CAP seamlessly blends implicit registration with fusion, simplifying the process and enhancing suitability for practical applications. MulFS-CAP utilizes a shared shallow feature encoder to merge unregistered infrared-visible images in a single stage. To address the specific requirements of feature-level alignment and fusion, we develop a consistent feature learning approach via a learnable modality dictionary. This dictionary provides complementary information for unimodal features, thereby maintaining consistency between individual and fused multimodal features. As a result, MulFS-CAP effectively reduces the impact of modality variance on cross-modality feature alignment, allowing for simultaneous registration and fusion. Additionally, in MulFS-CAP, we advance a novel cross-modality alignment approach, creating a correlation matrix to detail pixel relationships between source images. This matrix aids in aligning features across infrared and visible images, further refining the fusion process. The above designs make MulFS-CAP more lightweight, effective and explicit registration-free. Experimental results from different datasets demonstrate the effectiveness of our proposed method and its superiority over the state-of-the-art two-stage methods. The source code of our method is available at https://github.com/YR0211/MulFS-CAP.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TPAMI.2025.3535617	DOI Listing

Publication Analysis

Top Keywords

cross-modality alignment

unregistered infrared-visible

multimodal fusion-supervised

fusion-supervised cross-modality

alignment perception

infrared-visible images

two-stage methods

source images

registration fusion

mulfs-cap

Similar Publications

Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts.

IEEE Trans Pattern Anal Mach Intell

February 2025

Yunxin Li Shenyuan Jiang Baotian Hu Longyue Wang Wanqi Zhong

Recent advancements in Multimodal Large Language Models (MLLMs) underscore the significance of scalable models and data to boost performance, yet this often incurs substantial computational costs. Although the Mixture of Experts (MoE) architecture has been employed to scale large language or visual-language models efficiently, these efforts typically involve fewer experts and limited modalities. To address this, our work presents the pioneering attempt to develop a unified MLLM with the MoE architecture, named Uni-MoE that can handle a wide array of modalities.

View Article and Find Full Text PDF

Similar Publications

MulFS-CAP: Multimodal Fusion-supervised Cross-modality Alignment Perception for Unregistered Infrared-visible Image Fusion.

IEEE Trans Pattern Anal Mach Intell

January 2025

Huafeng Li Zengyi Yang Yafei Zhang Wei Jia Zhengtao Yu

View Article and Find Full Text PDF

Similar Publications

Asymmetric Adaptive Heterogeneous Network for Multi-Modality Medical Image Segmentation.

IEEE Trans Med Imaging

January 2025

Shenhai Zheng Xin Ye Chaohui Yang Lei Yu Weisheng Li

Existing studies of multi-modality medical image segmentation tend to aggregate all modalities without discrimination and employ multiple symmetric encoders or decoders for feature extraction and fusion. They often overlook the different contributions to visual representation and intelligent decisions among multi-modality images. Motivated by this discovery, this paper proposes an asymmetric adaptive heterogeneous network for multi-modality image feature extraction with modality discrimination and adaptive fusion.

View Article and Find Full Text PDF

Similar Publications

Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection.

IEEE Trans Pattern Anal Mach Intell

December 2024

Hao Tang Zechao Li Dong Zhang Shengfeng He Jinhui Tang

RGB-Thermal Salient Object Detection (RGB-T SOD) aims to pinpoint prominent objects within aligned pairs of visible and thermal infrared images. A key challenge lies in bridging the inherent disparities between RGB and Thermal modalities for effective saliency map prediction. Traditional encoder-decoder architectures, while designed for cross-modality feature interactions, may not have adequately considered the robustness against noise originating from defective modalities, thereby leading to suboptimal performance in complex scenarios.

View Article and Find Full Text PDF

Similar Publications

Unsupervised Domain Adaptation for Cross-Modality Cerebrovascular Segmentation.

IEEE J Biomed Health Inform

December 2024

Yinuo Wang Cai Meng Zhouping Tang Xiangzhuo Bai Ping Ji

Cerebrovascular segmentation from time-of-flight magnetic resonance angiography (TOF-MRA) and computed tomography angiography (CTA) is essential in providing supportive information for diagnosing and treatment planning of multiple intracranial vascular diseases. Different imaging modalities utilize distinct principles to visualize the cerebral vasculature, which leads to the limitations of expensive annotations and performance degradation while training and deploying deep learning models. In this paper, we propose an unsupervised domain adaptation framework CereTS to perform translation and segmentation of cross-modality unpaired cerebral angiography.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!