AI Article Synopsis

  • The paper introduces a new method called Probabilistic Visual Prompt Unified Transformer (PVPUFormer) to improve interactive image segmentation by effectively utilizing diverse visual prompts like clicks and scribbles.
  • Despite existing methods focusing only on the prompts' positions, PVPUFormer considers both the prompts and their surrounding context for better feedback.
  • Key innovations include a new encoder for richer data representation and a dual-cross merging attention module that enhances feature interaction, resulting in improved performance validated through various experiments.

Article Abstract

Integration of diverse visual prompts like clicks, scribbles, and boxes in interactive image segmentation significantly facilitates users' interaction as well as improves interaction efficiency. However, existing studies primarily encode the position or pixel regions of prompts without considering the contextual areas around them, resulting in insufficient prompt feedback, which is not conducive to performance acceleration. To tackle this problem, this paper proposes a simple yet effective Probabilistic Visual Prompt Unified Transformer (PVPUFormer) for interactive image segmentation, which allows users to flexibly input diverse visual prompts with the probabilistic prompt encoding and feature post-processing to excavate sufficient and robust prompt features for performance boosting. Specifically, we first propose a Probabilistic Prompt-unified Encoder (PPuE) to generate a unified one-dimensional vector by exploring both prompt and non-prompt contextual information, offering richer feedback cues to accelerate performance improvement. On this basis, we further present a Prompt-to-Pixel Contrastive (PC) loss to accurately align both prompt and pixel features, bridging the representation gap between them to offer consistent feature representations for mask prediction. Moreover, our approach designs a Dual-cross Merging Attention (DMA) module to implement bidirectional feature interaction between image and prompt features, generating notable features for performance improvement. A comprehensive variety of experiments on several challenging datasets demonstrates that the proposed components achieve consistent improvements, yielding state-of-the-art interactive segmentation performance. Our code is available at https://github.com/XuZhang1211/PVPUFormer.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2024.3492713DOI Listing

Publication Analysis

Top Keywords

interactive image
12
image segmentation
12
probabilistic visual
8
prompt
8
visual prompt
8
prompt unified
8
unified transformer
8
diverse visual
8
visual prompts
8
prompt features
8

Similar Publications

We realize a Laughlin state of two rapidly rotating fermionic atoms in an optical tweezer. By utilizing a single atom and spin resolved imaging technique, we sample the Laughlin wave function thereby revealing its distinctive features, including a vortex distribution in the relative motion, correlations in the particles' relative angle, and suppression of the interparticle interactions. Our Letter lays the foundation for atom-by-atom assembly of fractional quantum Hall states in rotating atomic gases.

View Article and Find Full Text PDF

Excitons, which are Coulomb bound electron-hole pairs, are composite bosons and thus at low temperature can form a superfluid state with a single well-defined amplitude and phase. We directly image this macroscopic exciton superfluid state in an hBN-separated MoSe-WSe heterostructure. At high density, we identify quasi-long-range order over the entire active area of our sample, through spatially resolved coherence measurements.

View Article and Find Full Text PDF

Background: Irritable Bowel Syndrome (IBS) is a prevalent condition characterized by dysregulated brain-gut interactions. Despite its widespread impact, the brain mechanism of IBS remains incompletely understood, and there is a lack of objective diagnostic criteria and biomarkers. This study aims to investigate brain network alterations in IBS patients using the functional connectivity strength (FCS) method and to develop a support vector machine (SVM) classifier for distinguishing IBS patients from healthy controls (HCs).

View Article and Find Full Text PDF

Glioma is characterized by high heterogeneity and poor prognosis. Attempts have been made to understand its diversity in both genetic expressions and radiomic characteristics, while few integrated the two omics in predicting survival of glioma. This study was intended to investigate the connection between glioma imaging and genome, and examine its predictive value in glioma mortality risk and tumor immune microenvironment (TIME).

View Article and Find Full Text PDF

Background: The growing number of AD patients is a public concern all over the world. During the decade, anti-amyloid beta-proteins (Aβ) monoclonal antibodies for AD patients have been developed. Among the immunotherapeutic agents, lecanemab is an anti-Aβ monoclonal antibody that binds to Aβ protofibrils (Aβ PFs), which is an intermediate molecule in Aβ species.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!