Multimodal learning improves task performance by integrating data from various sources, aiming for robustness against missing or damaged information in some modalities.
Current multimodal networks struggle significantly when one or more data types are missing during testing, leading to performance drops.
The authors propose an efficient adaptation method that adjusts features of pretrained networks to counteract the absence of modalities, achieving better results than standalone networks and requiring minimal additional parameters.