Recently, memory-based methods have achieved remarkable progress in video object segmentation. However, the segmentation performance is still limited by error accumulation and redundant memory, primarily because of 1) the semantic gap caused by similarity matching and memory reading via heterogeneous key-value encoding; 2) the continuously growing and inaccurate memory through directly storing unreliable predictions of all previous frames. To address these issues, we propose an efficient, effective, and robust segmentation method based on Isogenous Memory Sampling and Frame-Relation mining (IMSFR). Specifically, by utilizing an isogenous memory sampling module, IMSFR consistently conducts memory matching and reading between sampled historical frames and the current frame in an isogenous space, minimizing the semantic gap while speeding up the model through an efficient random sampling. Furthermore, to avoid key information loss during the sampling process, we further design a frame-relation temporal memory module to mine inter-frame relations, thereby effectively preserving contextual information from the video sequence and alleviating error accumulation. Extensive experiments demonstrate the effectiveness and efficiency of the proposed IMSFR method. In particular, our IMSFR achieves state-of-the-art performance on six commonly used benchmarks in terms of region similarity & contour accuracy and speed. Our model also exhibits strong robustness against frame sampling due to its large receptive field.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2023.3280389DOI Listing

Publication Analysis

Top Keywords

isogenous memory
12
memory sampling
12
video object
8
object segmentation
8
memory
8
error accumulation
8
semantic gap
8
sampling
6
efficient robust
4
robust video
4

Similar Publications

Recently, memory-based methods have achieved remarkable progress in video object segmentation. However, the segmentation performance is still limited by error accumulation and redundant memory, primarily because of 1) the semantic gap caused by similarity matching and memory reading via heterogeneous key-value encoding; 2) the continuously growing and inaccurate memory through directly storing unreliable predictions of all previous frames. To address these issues, we propose an efficient, effective, and robust segmentation method based on Isogenous Memory Sampling and Frame-Relation mining (IMSFR).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!