Toward Robust Referring Image Segmentation.

Jianzong Wu Xiangtai Li Xia Li Henghui Ding Yunhai Tong Dacheng Tao

IEEE Trans Image Process

Published: March 2024

Referring Image Segmentation (RIS) traditionally outputs object masks based on text descriptions, but struggles with misleading descriptions that don't correspond to the image.
The authors introduce Robust Referring Image Segmentation (R-RIS), which accounts for both positive and negative sentence inputs to improve segmentation accuracy.
They also present a new transformer model, RefSegformer, and create datasets and metrics to evaluate this approach, achieving state-of-the-art results for both RIS and R-RIS.

Referring Image Segmentation (RIS) is a fundamental vision-language task that outputs object masks based on text descriptions. Many works have achieved considerable progress for RIS, including different fusion method designs. In this work, we explore an essential question, "What if the text description is wrong or misleading?" For example, the described objects are not in the image. We term such a sentence as a negative sentence. However, existing solutions for RIS cannot handle such a setting. To this end, we propose a new formulation of RIS, named Robust Referring Image Segmentation (R-RIS). It considers the negative sentence inputs besides the regular positive text inputs. To facilitate this new task, we create three R-RIS datasets by augmenting existing RIS datasets with negative sentences and propose new metrics to evaluate both types of inputs in a unified manner. Furthermore, we propose a new transformer-based model, called RefSegformer, with a token-based vision and language fusion module. Our design can be easily extended to our R-RIS setting by adding extra blank tokens. Our proposed RefSegformer achieves state-of-the-art results on both RIS and R-RIS datasets, establishing a solid baseline for both settings. Our project page is at https://github.com/jianzongwu/robust-ref-seg.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TIP.2024.3371348	DOI Listing

Publication Analysis

Top Keywords

referring image

image segmentation

robust referring

negative sentence

r-ris datasets

ris

image

segmentation referring

segmentation ris

ris fundamental

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!

A PHP Error was encountered