Proto-Adapter: Efficient Training-Free CLIP-Adapter for Few-Shot Image Classification.

Sensors (Basel)

Department of Electrical Engineering, Faculty of Science and Technology, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama 223-8522, Kanagawa, Japan.

Published: June 2024

AI Article Synopsis

  • Large vision-language models like CLIP have strong zero-shot transfer abilities, but improving few-shot recognition requires more adaptive methods like Tip-Adapter, which increases adapter size with more training samples.
  • * Proto-Adapter is introduced as a more efficient alternative, using a constant-sized adapter that derives weights from prototype representations, enabling better performance without inflating model size.
  • * Fine-tuning with a distance margin penalty enhances Proto-Adapter's effectiveness, helping achieve discriminative outcomes even with minimal training data, as shown in numerous experiments.

Article Abstract

Large vision-language models, such as Contrastive Vision-Language Pre-training (CLIP), pre-trained on large-scale image-text datasets, have demonstrated robust zero-shot transfer capabilities across various downstream tasks. To further enhance the few-shot recognition performance of CLIP, Tip-Adapter augments the CLIP model with an adapter that incorporates a key-value cache model constructed from the few-shot training set. This approach enables training-free adaptation and has shown significant improvements in few-shot recognition, especially with additional fine-tuning. However, the size of the adapter increases in proportion to the number of training samples, making it difficult to deploy in practical applications. In this paper, we propose a novel CLIP adaptation method, named Proto-Adapter, which employs a single-layer adapter of constant size regardless of the amount of training data and even outperforms Tip-Adapter. Proto-Adapter constructs the adapter's weights based on prototype representations for each class. By aggregating the features of the training samples, it successfully reduces the size of the adapter without compromising performance. Moreover, the performance of the model can be further enhanced by fine-tuning the adapter's weights using a distance margin penalty, which imposes additional inter-class discrepancy to the output logits. We posit that this training scheme allows us to obtain a model with a discriminative decision boundary even when trained with a limited amount of data. We demonstrate the effectiveness of the proposed method through extensive experiments of few-shot classification on diverse datasets.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11175357PMC
http://dx.doi.org/10.3390/s24113624DOI Listing

Publication Analysis

Top Keywords

few-shot recognition
8
size adapter
8
training samples
8
adapter's weights
8
few-shot
5
training
5
proto-adapter efficient
4
efficient training-free
4
training-free clip-adapter
4
clip-adapter few-shot
4

Similar Publications

Few-shot learning (FSL) methods have made remarkable progress in the field of plant disease recognition, especially in scenarios with limited available samples. However, current FSL approaches are usually limited to a restrictive setting where base classes and novel classes come from the same domain such as PlantVillage. Consequently, when the model is generalized to new domains (field disease datasets), its performance drops sharply.

View Article and Find Full Text PDF

With the advancement of artificial intelligence technology, unmanned boats utilizing deep learning models have shown significant potential in water surface garbage classification. This study employs Convolutional Neural Network (CNN) to extract features of water surface floating objects and constructs the VGG16-15 model based on the VGG-16 architecture, capable of identifying 15 common types of water surface floatables. A garbage classification dataset was curated to obtain 5707 images belonging to 15 categories, which were then split into training and validation sets in a 4:1 ratio.

View Article and Find Full Text PDF

Performance and Reproducibility of Large Language Models in Named Entity Recognition: Considerations for the Use in Controlled Environments.

Drug Saf

December 2024

Pharmaceuticals, Medical Affairs and Pharmacovigilance, Data Science and Insights, Bayer AG, Müllerstr. 178, 13353, Berlin, Germany.

Introduction: Recent artificial intelligence (AI) advances can generate human-like responses to a wide range of queries, making them a useful tool for healthcare applications. Therefore, the potential use of large language models (LLMs) in controlled environments regarding efficacy, reproducibility, and operability will be of paramount interest.

Objective: We investigated if and how GPT 3.

View Article and Find Full Text PDF

A multimodal approach for few-shot biomedical named entity recognition in low-resource languages.

J Biomed Inform

January 2025

Department of Population Health Sciences, Weill Cornell Medicine, New York 10022, USA. Electronic address:

In this study, we revisit named entity recognition (NER) in the biomedical domain from a multimodal perspective, with a particular focus on applications in low-resource languages. Existing research primarily relies on unimodal methods for NER, which limits the potential for capturing diverse information. To address this limitation, we propose a novel method that integrates a cross-modal generation module to transform unimodal data into multimodal data, thereby enabling the use of enriched multimodal information for NER.

View Article and Find Full Text PDF

TC-Sniffer: A Transformer-CNN Bibranch Framework Leveraging Auxiliary VOCs for Few-Shot UBC Diagnosis via Electronic Noses.

ACS Sens

November 2024

Interdisciplinary Research Center of Smart Sensors, Academy of Advanced Interdisciplinary Research, Xidian University, Xi'an 710126, China.

Utilizing electronic noses (e-noses) with pattern recognition algorithms offers a promising noninvasive method for the early detection of urinary bladder cancer (UBC). However, limited clinical samples often hinder existing artificial intelligence (AI)-assisted diagnosis. This paper proposes TC-Sniffer, a novel bibranch framework for few-shot UBC diagnosis, leveraging easily obtainable UBC-related volatile organic components (VOCs) as auxiliary classification categories.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!