MixIR: Mixing Input and Representations for Contrastive Learning.

Tianhao Zhao Xiaoyang Guo Yutian Lin Bo Du

IEEE Trans Neural Netw Learn Syst

Published: August 2024

Recently, contrastive learning has shown significant progress in learning visual representations from unlabeled data. The core idea is training the backbone to be invariant to different augmentations of an instance. While most methods only maximize the feature similarity between two augmented data, we further generate more challenging training samples and force the model to keep predicting aggregated representation on these hard samples. In this article, we propose MixIR, a mixture-based approach upon the traditional Siamese network. On the one hand, we input two augmented images of an instance to the backbone and obtain the aggregated representation by performing an elementwise maximum of two features. On the other hand, we take the mixture of these augmented images as input and expect the model prediction to be close to the aggregated representation. In this way, the model could access more variant data samples of an instance and keep predicting invariant representations for them. Thus, the learned model is more discriminative compared with previous contrastive learning methods. Extensive experiments on large-scale datasets show that MixIR steadily improves the baseline and achieves competitive results with state-of-the-art methods. Our code is available at https://github.com/happytianhao/MixIR.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TNNLS.2024.3439538	DOI Listing

Publication Analysis

Top Keywords

contrastive learning

aggregated representation

augmented images

mixir mixing

mixing input

input representations

representations contrastive

learning

learning contrastive

learning progress

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!