Recently, the multimodal large language model (MLLM) represented by GPT-4V has been a new rising research hotspot, which uses powerful large language models (LLMs) as a brain to perform multimodal tasks. The surprising emergent capabilities of the MLLM, such as writing stories based on images and optical character recognition-free math reasoning, are rare in traditional multimodal methods, suggesting a potential path to artificial general intelligence. To this end, both academia and industry have endeavored to develop MLLMs that can compete with or even outperform GPT-4V, pushing the limit of research at a surprising speed.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
July 2023
Cross-spectral face hallucination is an intuitive way to mitigate the modality discrepancy in Heterogeneous Face Recognition (HFR). However, due to imaging differences, the hallucination inevitably suffers from a shape misalignment between paired heterogeneous images. Rather than building complicated architectures to circumvent the problem like previous works, we propose a simple yet effective method called Shape Alignment FacE (SAFE).
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
June 2022
Heterogeneous face recognition (HFR) refers to matching cross-domain faces and plays a crucial role in public security. Nevertheless, HFR is confronted with challenges from large domain discrepancy and insufficient heterogeneous data. In this paper, we formulate HFR as a dual generation problem, and tackle it via a novel dual variational generation (DVG-Face) framework.
View Article and Find Full Text PDF