Image and sentence matching has attracted much attention recently, and many effective methods have been proposed to deal with it. But even the current state-of-the-arts still cannot well associate those challenging pairs of images and sentences containing few-shot content in their regions and words. In fact, such a few-shot matching problem is seldom studied and has become a bottleneck for further performance improvement in real-world applications. In this work, we formulate this challenging problem as few-shot image and sentence matching, and accordingly propose an Aligned Cross-Modal Memory (ACMM) model to deal with it. The model can not only softly align few-shot regions and words in a weakly-supervised manner, but also persistently store and update cross-modal prototypical representations of few-shot classes as references, without using any groundtruth region-word correspondence. The model can also adaptively balance the relative importance between few-shot and common content in the image and sentence, which leads to better measurement of overall similarity. We perform extensive experiments in terms of both few-shot and conventional image and sentence matching, and demonstrate the effectiveness of the proposed model by achieving the state-of-the-art results on two public benchmark datasets.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2021.3052490DOI Listing

Publication Analysis

Top Keywords

image sentence
20
sentence matching
16
few-shot
8
few-shot image
8
aligned cross-modal
8
cross-modal memory
8
sentence
5
matching
5
matching aligned
4
image
4

Similar Publications

Word problems are essential for math learning and education, bridging numerical knowledge with real-world applications. Despite their importance, the neural mechanisms underlying word problem solving, especially in children, remain poorly understood. Here, we examine children's cognitive and brain response profiles for arithmetic word problems (AWPs), which involve one-step mathematical operations, and compare them with nonarithmetic word problems (NWPs), structured as parallel narratives without numerical operations.

View Article and Find Full Text PDF

Optical character recognition (OCR) is vital in digitizing printed data into a digital format, which can be conveniently used for various purposes. A significant amount of work has been done in OCR for well-resourced languages like English. However, languages like Urdu, spoken by a large community, face limitations in OCR due to a lack of resources and the complexity and diversity of handwritten scripts.

View Article and Find Full Text PDF

This study examines the neural mechanisms behind integrating syntactic and information structures during sentence comprehension using functional Magnetic Resonance Imaging. Focusing on Japanese sentences with canonical (SOV) and non-canonical (OSV) word orders, the study revealed distinct neural networks responsible for processing these linguistic structures. The left opercular part of the inferior frontal gyrus, left premotor area, and left posterior superior/middle temporal gyrus were primarily involved in syntactic processing.

View Article and Find Full Text PDF

Pronouns create cohesive links in discourse by referring to previously mentioned elements. Here, we focus on pronominalization during speech production in three experiments employing ERP and fMRI methodologies. Participants were asked to produce two short sentences describing a man or woman using an object.

View Article and Find Full Text PDF

Disease prediction using computer-based methods is now an established area of research. The importance of technological intervention is necessary for the better management of disease, as well as to optimize use of limited resources. Various AI-based methods for disease prediction have been documented in the literature.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!