Supporting vision-language model few-shot inference with confounder-pruned knowledge prompt.

Neural Netw

National Key Laboratory of Space Integrated Information System, Institute of Software Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China.

Published: January 2025

Vision-language models are pre-trained by aligning image-text pairs in a common space to deal with open-set visual concepts. Recent works adopt fixed or learnable prompts, i.e., classification weights are synthesized from natural language descriptions of task-relevant categories, to reduce the gap between tasks during the pre-training and inference phases. However, how and what prompts can improve inference performance remains unclear. In this paper, we explicitly clarify the importance of incorporating semantic information into prompts, while existing prompting methods generate prompts without sufficiently exploring the semantic information of textual labels. Manually constructing prompts with rich semantics requires domain expertise and is extremely time-consuming. To cope with this issue, we propose a knowledge-aware prompt learning method, namely Confounder-pruned Knowledge Prompt (CPKP), which retrieves an ontology knowledge graph by treating the textual label as a query to extract task-relevant semantic information. CPKP further introduces a double-tier confounder-pruning procedure to refine the derived semantic information. Adhering to the individual causal effect principle, the graph-tier confounders are gradually identified and phased out. The feature-tier confounders are eliminated by following the maximum entropy principle in information theory. Empirically, the evaluations demonstrate the effectiveness of CPKP in few-shot inference, e.g., with only two shots, CPKP outperforms the manual-prompt method by 4.64% and the learnable-prompt method by 1.09% on average.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.neunet.2025.107173DOI Listing

Publication Analysis

Top Keywords

few-shot inference
8
confounder-pruned knowledge
8
knowledge prompt
8
prompts
5
supporting vision-language
4
vision-language model
4
model few-shot
4
inference
4
inference confounder-pruned
4
prompt vision-language
4

Similar Publications

Supporting vision-language model few-shot inference with confounder-pruned knowledge prompt.

Neural Netw

January 2025

National Key Laboratory of Space Integrated Information System, Institute of Software Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China.

Vision-language models are pre-trained by aligning image-text pairs in a common space to deal with open-set visual concepts. Recent works adopt fixed or learnable prompts, i.e.

View Article and Find Full Text PDF

A vast multitude of tasks in histopathology could potentially benefit from the support of artificial intelligence (AI). Many examples have been shown in the literature and first commercial products with FDA or CE-IVDR clearance are available. However, two key challenges remain: (1) a scarcity of thoroughly annotated images, respectively the laboriousness of this task, and (2) the creation of robust models that can cope with the data heterogeneity in the field (domain generalization).

View Article and Find Full Text PDF

Motivation: This study aims to develop an AI-driven framework that leverages large language models (LLMs) to simulate scientific reasoning and peer review to predict efficacious combinatorial therapy when data-driven prediction is infeasible.

Results: Our proposed framework achieved a significantly higher accuracy (0.74) than traditional knowledge-based prediction (0.

View Article and Find Full Text PDF

Background And Objectives: Chest X-ray (CXR) images are commonly used to diagnose respiratory and cardiovascular diseases. However, traditional manual interpretation is often subjective, time-consuming, and prone to errors, leading to inconsistent detection accuracy and poor generalization. In this paper, we present deep learning-based object detection methods for automatically identifying and annotating abnormal regions in CXR images.

View Article and Find Full Text PDF

Accurate counting of cereals crops, e.g., maize, rice, sorghum, and wheat, is crucial for estimating grain production and ensuring food security.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!