CLIP-Driven Prototype Network for Few-Shot Semantic Segmentation.

Entropy (Basel)

College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China.

Published: September 2023

Recent research has shown that visual-text pretrained models perform well in traditional vision tasks. CLIP, as the most influential work, has garnered significant attention from researchers. Thanks to its excellent visual representation capabilities, many recent studies have used CLIP for pixel-level tasks. We explore the potential abilities of CLIP in the field of few-shot segmentation. The current mainstream approach is to utilize support and query features to generate class prototypes and then use the prototype features to match image features. We propose a new method that utilizes CLIP to extract text features for a specific class. These text features are then used as training samples to participate in the model's training process. The addition of text features enables model to extract features that contain richer semantic information, thus making it easier to capture potential class information. To better match the query image features, we also propose a new prototype generation method that incorporates multi-modal fusion features of text and images in the prototype generation process. Adaptive query prototypes were generated by combining foreground and background information from the images with the multi-modal support prototype, thereby allowing for a better matching of image features and improved segmentation accuracy. We provide a new perspective to the task of few-shot segmentation in multi-modal scenarios. Experiments demonstrate that our proposed method achieves excellent results on two common datasets, PASCAL-5i and COCO-20i.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10529322PMC
http://dx.doi.org/10.3390/e25091353DOI Listing

Publication Analysis

Top Keywords

image features
12
text features
12
features
10
few-shot segmentation
8
features propose
8
prototype generation
8
clip-driven prototype
4
prototype network
4
network few-shot
4
few-shot semantic
4

Similar Publications

The intelligent identification of wear particles in ferrography is a critical bottleneck that hampers the development and widespread adoption of ferrography technology. To address challenges such as false detection, missed detection of small wear particles, difficulty in distinguishing overlapping and similar abrasions, and handling complex image backgrounds, this paper proposes an algorithm called TCBGY-Net for detecting wear particles in ferrography images. The proposed TCBGY-Net uses YOLOv5s as the backbone network, which is enhanced with several advanced modules to improve detection performance.

View Article and Find Full Text PDF

Vertebral collapse (VC) following osteoporotic vertebral compression fracture (OVCF) often requires aggressive treatment, necessitating an accurate prediction for early intervention. This study aimed to develop a predictive model leveraging deep neural networks to predict VC progression after OVCF using magnetic resonance imaging (MRI) and clinical data. Among 245 enrolled patients with acute OVCF, data from 200 patients were used for the development dataset, and data from 45 patients were used for the test dataset.

View Article and Find Full Text PDF

Taking advantage of the good mechanical strength of expanded Drosophila brains and to tackle their relatively large size that can complicate imaging, we apply potassium (poly)acrylate-based hydrogels for expansion microscopy (ExM), resulting in a 40x plus increased resolution of transgenic fluorescent proteins preserved by glutaraldehyde fixation in the nervous system. Large-volume ExM is realized by using an axicon-based Bessel lightsheet microscope, featuring gentle multi-color fluorophore excitation and intrinsic optical sectioning capability, enabling visualization of Tm5a neurites and L3 lamina neurons with photoreceptors in the optic lobe. We also image nanometer-sized dopaminergic neurons across the same intact iteratively expanded Drosophila brain, enabling us to measure the 3D expansion ratio.

View Article and Find Full Text PDF

Early prediction of patient responses to neoadjuvant chemotherapy (NACT) is essential for the precision treatment of early breast cancer (EBC). Therefore, this study aims to noninvasively and early predict pathological complete response (pCR). We used dynamic ultrasound (US) imaging changes acquired during NACT, along with clinicopathological features, to create a nomogram and construct a machine learning model.

View Article and Find Full Text PDF

Weather recognition is crucial due to its significant impact on various aspects of daily life, such as weather prediction, environmental monitoring, tourism, and energy production. Several studies have already conducted research on image-based weather recognition. However, previous studies have addressed few types of weather phenomena recognition from images with insufficient accuracy.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!