A Study on Generative Models for Visual Recognition of Unknown Scenes Using a Textual Description.

Jose Martinez-Carranza Delia Irazú Hernández-Farías Victoria Eugenia Vazquez-Meza Leticia Oyuki Rojas-Perez Aldrich Alfredo Cabrera-Ponce

Sensors (Basel)

Faculty of Computer Science, Benemerita Universidad Autonoma de Puebla (BUAP), Puebla 72570, Mexico.

Published: October 2023

In this study, we investigate the application of generative models to assist artificial agents, such as delivery drones or service robots, in visualising unfamiliar destinations solely based on textual descriptions. We explore the use of generative models, such as Stable Diffusion, and embedding representations, such as CLIP and VisualBERT, to compare generated images obtained from textual descriptions of target scenes with images of those scenes. Our research encompasses three key strategies: image generation, text generation, and text enhancement, the latter involving tools such as ChatGPT to create concise textual descriptions for evaluation. The findings of this study contribute to an understanding of the impact of combining generative tools with multi-modal embedding representations to enhance the artificial agent's ability to recognise unknown scenes. Consequently, we assert that this research holds broad applications, particularly in drone parcel delivery, where an aerial robot can employ text descriptions to identify a destination. Furthermore, this concept can also be applied to other service robots tasked with delivering to unfamiliar locations, relying exclusively on user-provided textual descriptions.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10649081	PMC
http://dx.doi.org/10.3390/s23218757	DOI Listing

Publication Analysis

Top Keywords

textual descriptions

generative models

unknown scenes

service robots

embedding representations

generation text

textual

descriptions

study generative

models visual

Similar Publications

A scoping review protocol on brain PaCO2 levels at altitude.

PLoS One

January 2025

Division of Neurology, Department of Medicine, The Ottawa Hospital, Ottawa, Ontario, Canada.

Hanna Tang Laurel Charlesworth Manoj Lalu Brian Dewar Risa Shorr

Background: Aeromedical transfer of patients with ischemic stroke to access hyperacute stroke treatment is becoming increasingly common. Little is known about how rapid changes of altitude and atmospheric pressure can impact cerebral perfusion and ischemic burden. In patients with ischemic stroke, there is a theoretical possibility that this physiologic response of hypoxia-driven hyperventilation at higher altitude can lead to a relative drop in PaCO2.

View Article and Find Full Text PDF

Similar Publications

Strategies for enhancing home-based cardiac rehabilitation self-management for patients with coronary heart disease: a qualitative study.

BMC Nurs

January 2025

The First Affiliated Hospital of China Medical University, No.155, Nanjing North Street, Heping District, Shenyang, Liaoning Province, China.

Zhen Yang Xutong Zheng Yu Gao Chunqi Zhang Aiping Wang

Background: Self-management is regarded as a crucial factor influencing the effectiveness of home-based cardiac rehabilitation for patients with coronary heart disease. In nursing practice, nurses employ a variety of strategies to enhance self-management of patients. However, there exists a disparity in nurses' perceptions and practical experiences with these strategies.

View Article and Find Full Text PDF

Similar Publications

Text-guided small molecule generation via diffusion model.

iScience

November 2024

University of Science and Technology of China, Hefei, Anhui, China.

Yanchen Luo Junfeng Fang Sihang Li Zhiyuan Liu Jiancan Wu

Article Synopsis

The generation of specific molecules is important in fields like biology and drug discovery, but existing models struggle with complex customization based on detailed language.
TextSMOG is a new method that combines language and diffusion models to allow for text-guided small molecule generation, enhancing stability and diversity.
Experimental results demonstrate that TextSMOG effectively uses textual descriptions to generate 3D molecular structures tailored to complex requests.

View Article and Find Full Text PDF

Similar Publications

The determination of patient-based experiences with smartphone-based report of awake bruxism using a diary.

Clin Oral Investig

January 2025

Department of Orofacial Pain and Dysfunction, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.

Anna Colonna Daniele Manfredini Alessandro Bracci Ovidiu Ionut Saracutu Marco Ferrari

Objectives: In recent years, a smartphone-based ecological momentary assessment (EMA) approach for assessing awake bruxism (AB) has attracted growing interest, both in clinical and research settings. The present study was designed to investigate subjects' experience using an EMA-based smartphone application to detect factors that could hamper or facilitate its use for clinical and research purposes.

Materials And Methods: Thirty-two patients with temporomandibular disorders (TMDs) pain (14 males, 18 females; mean age 28.

View Article and Find Full Text PDF

Similar Publications

Cross-device and test-retest reliability of speech acoustic measurements derived from consumer-grade mobile recording devices.

Behav Res Methods

December 2024

Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China.

Zian Hu Zhenglin Zhang Hai Li Li-Zhuang Yang

In recent years, there has been growing interest in remote speech assessment through automated speech acoustic analysis. While the reliability of widely used features has been validated in professional recording settings, it remains unclear how the heterogeneity of consumer-grade recording devices, commonly used in nonclinical settings, impacts the reliability of these measurements. To address this issue, we systematically investigated the cross-device and test-retest reliability of classical speech acoustic measurements in a sample of healthy Chinese adults using consumer-grade equipment across three popular speech tasks: sustained phonation (SP), diadochokinesis (DDK), and picture description (PicD).

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!