Matching hand-drawn sketches with photos (a.k.a sketch-photo recognition or re-identification) faces the information asymmetry challenge due to the abstract nature of the sketch modality. Existing works tend to learn shared embedding spaces with CNN models by discarding the appearance cues for photo images or introducing GAN for sketch-photo synthesis. The former unavoidably loses discriminability, while the latter contains ineffaceable generation noise. In this paper, we start the first attempt to design an information-aligned sketch transformer (SketchTrans ) via cross-modal disentangled prototype learning, while the transformer has shown great promise for discriminative visual modelling. Specifically, we design an asymmetric disentanglement scheme with a dynamic updatable auxiliary sketch (A-sketch) to align the modality representations without sacrificing information. The asymmetric disentanglement decomposes the photo representations into sketch-relevant and sketch-irrelevant cues, transferring sketch-irrelevant knowledge into the sketch modality to compensate for the missing information. Moreover, considering the feature discrepancy between the two modalities, we present a modality-aware prototype contrastive learning method that mines representative modality-sharing information using the modality-aware prototypes rather than the original feature representations. Extensive experiments on category- and instance-level sketch-based datasets validate the superiority of our proposed method under various metrics.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2023.3337005DOI Listing

Publication Analysis

Top Keywords

disentangled prototype
8
prototype learning
8
learning transformer
8
sketch-photo recognition
8
sketch modality
8
asymmetric disentanglement
8
sketchtrans disentangled
4
transformer sketch-photo
4
recognition matching
4
matching hand-drawn
4

Similar Publications

Orbitronics and valleytronics, analogous to spintronics, leverage the orbital degree of freedom and the valley degree of freedom of electrons to carry information, promising significant advancements in information processing. In this study, we disentangle the orbital and valley Nernst effect (VNE) in 2D monolayers, based on the global symmetry of the monolayers. We conduct an in-depth analysis of the orbital (valley) Nernst effect in inversion symmetric (asymmetric) monolayers, using an analytical tight binding model.

View Article and Find Full Text PDF

The spectral and transport properties of strongly correlated metals, such as SrVO_{3} (SVO), are widely attributed to electron-electron (e-e) interactions, with lattice vibrations (phonons) playing a secondary role. Here, using first-principles electron-phonon (e-ph) and dynamical mean field theory calculations, we show that e-ph interactions play an essential role in SVO: they govern the electron scattering and resistivity in a wide temperature range down to 30 K, and induce an experimentally observed kink in the spectral function. In contrast, the e-e interactions control quasiparticle renormalization and low temperature transport, and enhance the e-ph coupling.

View Article and Find Full Text PDF

The competition between host-guest binding and solvent interactions is a crucial factor in determining the binding affinities and selectivity of molecular receptor species. The interplay between these competing interactions, however, have been difficult to disentangle. In particular, the development of molecular-level descriptions of solute-solvent interactions remains a grand experimental challenge.

View Article and Find Full Text PDF

Deep cooperative multi-agent reinforcement learning has demonstrated its remarkable success over a wide spectrum of complex control tasks. However, recent advances in multi-agent learning mainly focus on value decomposition while leaving entity interactions still intertwined, which easily leads to over-fitting on noisy interactions between entities. In this work, we introduce a novel interactiOn Pattern disenTangling (OPT) method, to disentangle the entity interactions into interaction prototypes, each of which represents an underlying interaction pattern within a subgroup of the entities.

View Article and Find Full Text PDF

This article studies an emerging practical problem called heterogeneous prototype learning (HPL). Unlike the conventional heterogeneous face synthesis (HFS) problem that focuses on precisely translating a face image from a source domain to another target one without removing facial variations, HPL aims at learning the variation-free prototype of an image in the target domain while preserving the identity characteristics. HPL is a compounded problem involving two cross-coupled subproblems, that is, domain transfer and prototype learning (PL), thus making most of the existing HFS methods that simply transfer the domain style of images unsuitable for HPL.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!