The task of image-text matching refers to measuring the visual-semantic similarity between an image and a sentence. Recently, the fine-grained matching methods that explore the local alignment between the image regions and the sentence words have shown advance in inferring the image-text correspondence by aggregating pairwise region-word similarity. However, the local alignment is hard to achieve as some important image regions may be inaccurately detected or even missing. Meanwhile, some words with high-level semantics cannot be strictly corresponding to a single-image region. To tackle these problems, we address the importance of exploiting the global semantic consistence between image regions and sentence words as complementary for the local alignment. In this article, we propose a novel hybrid matching approach named Cross-modal Attention with Semantic Consistency (CASC) for image-text matching. The proposed CASC is a joint framework that performs cross-modal attention for local alignment and multilabel prediction for global semantic consistence. It directly extracts semantic labels from available sentence corpus without additional labor cost, which further provides a global similarity constraint for the aggregated region-word similarity obtained by the local alignment. Extensive experiments on Flickr30k and Microsoft COCO (MSCOCO) data sets demonstrate the effectiveness of the proposed CASC on preserving global semantic consistence along with the local alignment and further show its superior image-text matching performance compared with more than 15 state-of-the-art methods.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2020.2967597DOI Listing

Publication Analysis

Top Keywords

local alignment
24
semantic consistence
16
image-text matching
16
cross-modal attention
12
image regions
12
global semantic
12
attention semantic
8
regions sentence
8
region-word similarity
8
similarity local
8

Similar Publications

Biomechanical Principles of Spinal Deformity Correction in the Thoracolumbar Spine.

J Am Acad Orthop Surg

January 2025

From the Children's Mercy Hospital, Kansas City, MO (Shaw), Children's Health Ireland at Temple Street, Dublin, Ireland (O'Sullivan), the Department of Mechanical Engineering, Polytechnique Montreal, Montreal, Quebec, Canada (Wang and Aubin), and the Sainte-Justine University Hospital Center, Montreal, Quebec, Canada (Wang and Aubin).

Thoracolumbar spinal deformities are a pervasive condition affecting the adolescent and adult patient population. These deformities represent three-dimensional alterations in the coronal, sagittal, and transverse planes with implication on the local, regional, and global alignment. With continued studies, the importance of the overall correction on long-term outcomes has been established.

View Article and Find Full Text PDF

Phthalates, known as phthalate esters (PAEs), are among the most ubiquitous pervasive env7ironmental endocrine disruptors (EEDs), extensively utilized globally in various facets of modern life due to their irreplaceable role as plasticizers. The exponential production and utilization of plastic goods have substantially escalated plastic waste accumulation. Consequently, PAEs have infiltrated the environment, contaminating food and drinking water reservoirs, posing notable threats to human health.

View Article and Find Full Text PDF

The emergence and global spread of carbapenem-resistant complex species present a pressing public health challenge. Carbapenem-resistant spp. cause a wide variety of infections, including septic shock fatalities in newborns and immunocompromised adults.

View Article and Find Full Text PDF

Objectives: Many individuals with dementia with Lewy bodies (DLB) die of disease-related complications, but predicting the end of life can be challenging. We identified a phenotype associated with approaching end of life.

Methods: We present 4 exemplar cases where individuals with DLB experienced refractory psychosis before death.

View Article and Find Full Text PDF

Background: Benign laryngeal lesions, characterized by non-cancerous growths in the larynx, significantly impact voice quality and respiratory function. These lesions, which include vocal cord polyps, nodules, papillomas, and cysts, often result from factors such as vocal abuse, viral infections, and chronic inflammation. While studies on benign laryngeal lesions are well-documented globally, data specific to Northern Nigeria remains sparse.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!