AI Article Synopsis

Article Abstract

Generating accurate and contextually rich captions for images and videos is essential for various applications, from assistive technology to content recommendation. However, challenges such as maintaining temporal coherence in videos, reducing noise in large-scale datasets, and enabling real-time captioning remain significant. We introduce MIRA-CAP (Memory-Integrated Retrieval-Augmented Captioning), a novel framework designed to address these issues through three core innovations: a cross-modal memory bank, adaptive dataset pruning, and a streaming decoder. The cross-modal memory bank retrieves relevant context from prior frames, enhancing temporal consistency and narrative flow. The adaptive pruning mechanism filters noisy data, which improves alignment and generalization. The streaming decoder allows for real-time captioning by generating captions incrementally, without requiring access to the full video sequence. Evaluated across standard datasets like MS COCO, YouCook2, ActivityNet, and Flickr30k, MIRA-CAP achieves state-of-the-art results, with high scores on CIDEr, SPICE, and Polos metrics, underscoring its alignment with human judgment and its effectiveness in handling complex visual and temporal structures. This work demonstrates that MIRA-CAP offers a robust, scalable solution for both static and dynamic captioning tasks, advancing the capabilities of vision-language models in real-world applications.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11679459PMC
http://dx.doi.org/10.3390/s24248013DOI Listing

Publication Analysis

Top Keywords

mira-cap memory-integrated
8
memory-integrated retrieval-augmented
8
retrieval-augmented captioning
8
captioning generating
8
real-time captioning
8
cross-modal memory
8
memory bank
8
streaming decoder
8
captioning
6
mira-cap
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!