Situational context is crucial for linguistic reference to visible objects, since the same description can refer unambiguously to an object in one context but be ambiguous or misleading in others. This also applies to Referring Expression Generation (), where the production of identifying descriptions is always dependent on a given context. Research in REG has long represented visual domains through information about objects and their properties, to determine identifying sets of target features during content determination. In recent years, research in has turned to neural modeling and recasted the REG task as an inherently multimodal problem, looking at more natural settings such as generating descriptions for objects in photographs. Characterizing the precise ways in which context influences generation is challenging in both paradigms, as context is notoriously lacking precise definitions and categorization. In multimodal settings, however, these problems are further exacerbated by the increased complexity and low-level representation of perceptual inputs. The main goal of this article is to provide a systematic review of the types and functions of visual context across various approaches to REG so far and to argue for integrating and extending different perspectives on visual context that currently co-exist in research on REG. By analyzing the ways in which symbolic REG integrates context in rule-based approaches, we derive a set of categories of contextual integration, including the distinction between and exerted by context during reference generation. Using this as a framework, we show that so far existing work in visual REG has considered only some of the ways in which visual context can facilitate end-to-end reference generation. Connecting with preceding research in related areas, as possible directions for future research, we highlight some additional ways in which contextual integration can be incorporated into REG and other multimodal generation tasks.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10072327 | PMC |
http://dx.doi.org/10.3389/frai.2023.1067125 | DOI Listing |
Sensors (Basel)
December 2024
Automation Department, North China Electric Power University, Baoding 071003, China.
Aiming at the severe occlusion problem and the tiny-scale object problem in the multi-fitting detection task, the Scene Knowledge Integrating Network (SKIN), including the scene filter module (SFM) and scene structure information module (SSIM) is proposed. Firstly, the particularity of the scene in the multi-fitting detection task is analyzed. Hence, the aggregation of the fittings is defined as the scene according to the professional knowledge of the power field and the habit of the operators in identifying the fittings.
View Article and Find Full Text PDFSensors (Basel)
December 2024
School of Biological and Environmental Sciences, Liverpool John Moores University, James Parsons Building, Byrom Street, Liverpool L3 3AF, UK.
Camera traps offer enormous new opportunities in ecological studies, but current automated image analysis methods often lack the contextual richness needed to support impactful conservation outcomes. Integrating vision-language models into these workflows could address this gap by providing enhanced contextual understanding and enabling advanced queries across temporal and spatial dimensions. Here, we present an integrated approach that combines deep learning-based vision and language models to improve ecological reporting using data from camera traps.
View Article and Find Full Text PDFSensors (Basel)
December 2024
Department of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 13120, Republic of Korea.
Generating accurate and contextually rich captions for images and videos is essential for various applications, from assistive technology to content recommendation. However, challenges such as maintaining temporal coherence in videos, reducing noise in large-scale datasets, and enabling real-time captioning remain significant. We introduce MIRA-CAP (Memory-Integrated Retrieval-Augmented Captioning), a novel framework designed to address these issues through three core innovations: a cross-modal memory bank, adaptive dataset pruning, and a streaming decoder.
View Article and Find Full Text PDFPharmaceuticals (Basel)
December 2024
Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200030, China.
Restless legs syndrome (RLS) is a common sensorimotor sleep disorder that affects sleep quality of life. Much effort has been made to make progress in RLS pharmacotherapy; however, patients with RLS still report poor long-term symptom control. Comprehensive Mendelian randomization (MR) was performed to search for potential causal genes and drug targets using the cis-pQTL and RLS GWAS data.
View Article and Find Full Text PDFMedicina (Kaunas)
December 2024
Department of Cardiac Surgery, Anthea Hospital GVM Care and Research, Via Camillo Rosalba 35/37, 70124 Bari, Italy.
In coronary artery bypass grafting (CABG) on pump, achieving optimal visualization is critical for surgical precision and safety. The use of blowers to clear the CABG anastomosis poses risks, including the formation of micro-embolic gas bubbles, which can be insidious and increase the risk of cerebral or myocardial complications. This retrospective study compares the effectiveness of the use of irrigation mist and CO versus a direct CO blower without irrigation in terms of visualization, postoperative fibrillation, and micro-embolic gas activity.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!