IEEE Trans Neural Netw Learn Syst
July 2024
Recent deep learning models can efficiently combine inputs from different modalities (e.g., images and text) and learn to align their latent representations or to translate signals from one domain to another (as in image captioning or text-to-image generation).
View Article and Find Full Text PDF