Visible-Infrared Person Re-identification (VI-ReID) has been consistently challenged by the significant intra-class variations and cross-modality differences between different cameras. Therefore, the key lies in how to extract discriminative modality-shared features. Existing VI-ReID methods based on Convolutional Neural Networks (CNN) and Vision Transformers (ViT) have shortcomings in capturing global features and controlling computational complexity, respectively.
View Article and Find Full Text PDF