Rethinking symbolic and visual context in Referring Expression Generation.

Front Artif Intell

Faculty of Linguistics and Literary Studies, Bielefeld University, Bielefeld, Germany.

Published: March 2023

Situational context is crucial for linguistic reference to visible objects, since the same description can refer unambiguously to an object in one context but be ambiguous or misleading in others. This also applies to Referring Expression Generation (), where the production of identifying descriptions is always dependent on a given context. Research in REG has long represented visual domains through information about objects and their properties, to determine identifying sets of target features during content determination. In recent years, research in has turned to neural modeling and recasted the REG task as an inherently multimodal problem, looking at more natural settings such as generating descriptions for objects in photographs. Characterizing the precise ways in which context influences generation is challenging in both paradigms, as context is notoriously lacking precise definitions and categorization. In multimodal settings, however, these problems are further exacerbated by the increased complexity and low-level representation of perceptual inputs. The main goal of this article is to provide a systematic review of the types and functions of visual context across various approaches to REG so far and to argue for integrating and extending different perspectives on visual context that currently co-exist in research on REG. By analyzing the ways in which symbolic REG integrates context in rule-based approaches, we derive a set of categories of contextual integration, including the distinction between and exerted by context during reference generation. Using this as a framework, we show that so far existing work in visual REG has considered only some of the ways in which visual context can facilitate end-to-end reference generation. Connecting with preceding research in related areas, as possible directions for future research, we highlight some additional ways in which contextual integration can be incorporated into REG and other multimodal generation tasks.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10072327	PMC
http://dx.doi.org/10.3389/frai.2023.1067125	DOI Listing

Publication Analysis

Top Keywords

visual context

context

referring expression

expression generation

contextual integration

reference generation

reg

visual

generation

rethinking symbolic

Similar Publications

A Scene Knowledge Integrating Network for Transmission Line Multi-Fitting Detection.

Sensors (Basel)

December 2024

Automation Department, North China Electric Power University, Baoding 071003, China.

Xinhang Chen Xinsheng Xu Jing Xu Wenjie Zheng Qianming Wang

Aiming at the severe occlusion problem and the tiny-scale object problem in the multi-fitting detection task, the Scene Knowledge Integrating Network (SKIN), including the scene filter module (SFM) and scene structure information module (SSIM) is proposed. Firstly, the particularity of the scene in the multi-fitting detection task is analyzed. Hence, the aggregation of the fittings is defined as the scene according to the professional knowledge of the power field and the habit of the operators in identifying the fittings.

View Article and Find Full Text PDF

Similar Publications

Towards Context-Rich Automated Biodiversity Assessments: Deriving AI-Powered Insights from Camera Trap Data.

Sensors (Basel)

December 2024

School of Biological and Environmental Sciences, Liverpool John Moores University, James Parsons Building, Byrom Street, Liverpool L3 3AF, UK.

Paul Fergus Carl Chalmers Naomi Matthews Stuart Nixon André Burger

Camera traps offer enormous new opportunities in ecological studies, but current automated image analysis methods often lack the contextual richness needed to support impactful conservation outcomes. Integrating vision-language models into these workflows could address this gap by providing enhanced contextual understanding and enabling advanced queries across temporal and spatial dimensions. Here, we present an integrated approach that combines deep learning-based vision and language models to improve ecological reporting using data from camera traps.

View Article and Find Full Text PDF

Similar Publications

MIRA-CAP: Memory-Integrated Retrieval-Augmented Captioning for State-of-the-Art Image and Video Captioning.

Sensors (Basel)

December 2024

Department of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 13120, Republic of Korea.

Sabina Umirzakova Shakhnoza Muksimova Sevara Mardieva Murodjon Sultanov Baxtiyarovich Young-Im Cho

Generating accurate and contextually rich captions for images and videos is essential for various applications, from assistive technology to content recommendation. However, challenges such as maintaining temporal coherence in videos, reducing noise in large-scale datasets, and enabling real-time captioning remain significant. We introduce MIRA-CAP (Memory-Integrated Retrieval-Augmented Captioning), a novel framework designed to address these issues through three core innovations: a cross-modal memory bank, adaptive dataset pruning, and a streaming decoder.

View Article and Find Full Text PDF

Similar Publications

Identification of Causal Genes and Potential Drug Targets for Restless Legs Syndrome: A Comprehensive Mendelian Randomization Study.

Pharmaceuticals (Basel)

December 2024

Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200030, China.

Ruiyi Qian Xue Zhao Dongbin Lyu Qingqing Xu Kai Yuan

Restless legs syndrome (RLS) is a common sensorimotor sleep disorder that affects sleep quality of life. Much effort has been made to make progress in RLS pharmacotherapy; however, patients with RLS still report poor long-term symptom control. Comprehensive Mendelian randomization (MR) was performed to search for potential causal genes and drug targets using the cis-pQTL and RLS GWAS data.

View Article and Find Full Text PDF

Similar Publications

Comparative Analysis of Irrigation Mist and CO vs. Direct CO Blower in On-Pump Coronary Artery Bypass Grafting Anastomosis: Efficacy, Efficiency, and Fibrillation upon De-Clamping and Micro-Embolic Gas Activity Incidence.

Medicina (Kaunas)

December 2024

Department of Cardiac Surgery, Anthea Hospital GVM Care and Research, Via Camillo Rosalba 35/37, 70124 Bari, Italy.

Ignazio Condello Giuseppe Speziale Flavio Fiore Giuseppe Nasso

In coronary artery bypass grafting (CABG) on pump, achieving optimal visualization is critical for surgical precision and safety. The use of blowers to clear the CABG anastomosis poses risks, including the formation of micro-embolic gas bubbles, which can be insidious and increase the risk of cerebral or myocardial complications. This retrospective study compares the effectiveness of the use of irrigation mist and CO versus a direct CO blower without irrigation in terms of visualization, postoperative fibrillation, and micro-embolic gas activity.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!