Efficient Multi-Task Training with Adaptive Feature Alignment for Universal Image Segmentation.

Sensors (Basel)

Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL 60616, USA.

Published: January 2025

Universal image segmentation aims to handle all segmentation tasks within a single model architecture and ideally requires only one training phase. To achieve task-conditioned joint training, a task token needs to be used in the multi-task training to condition the model for specific tasks. Existing approaches generate the task token from a text input (e.g., "the task is panoptic"). However, such text-based inputs merely serve as labels and fail to capture the inherent differences between tasks, potentially misleading the model. In addition, the discrepancy between visual and textual modalities limits the performance gains in existing text-involved segmentation models. Nevertheless, prevailing modality-alignment methods rely on large-scale uni-modal encoders for both modalities and an extremely large amount of paired data for training, and therefore it is hard to apply these existing models to lightweight segmentation models and resource-constrained devices. In this paper, we propose Adaptive Feature Alignment (AFA) integrated with a learnable task token to address these issues. The learnable task token automatically captures inter-task differences from both image features and text queries during training, providing a more effective and efficient solution than a predefined text-based token. To efficiently align the two modalities without introducing extra complexity, we reconsider the differences between a text token and an image token and replace image features with class-specific means in our proposed AFA. We evaluate our model performance on the ADE20K and Cityscapes datasets. Experimental results demonstrate that our model surpasses baseline models in both efficiency and effectiveness, achieving state-of-the-art performance among segmentation models with a comparable amount of parameters.

Download full-text PDF

Source
http://dx.doi.org/10.3390/s25020359DOI Listing

Publication Analysis

Top Keywords

task token
16
segmentation models
12
multi-task training
8
adaptive feature
8
feature alignment
8
universal image
8
image segmentation
8
learnable task
8
image features
8
token
7

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!