In this study, we investigate the application of generative models to assist artificial agents, such as delivery drones or service robots, in visualising unfamiliar destinations solely based on textual descriptions. We explore the use of generative models, such as Stable Diffusion, and embedding representations, such as CLIP and VisualBERT, to compare generated images obtained from textual descriptions of target scenes with images of those scenes. Our research encompasses three key strategies: image generation, text generation, and text enhancement, the latter involving tools such as ChatGPT to create concise textual descriptions for evaluation. The findings of this study contribute to an understanding of the impact of combining generative tools with multi-modal embedding representations to enhance the artificial agent's ability to recognise unknown scenes. Consequently, we assert that this research holds broad applications, particularly in drone parcel delivery, where an aerial robot can employ text descriptions to identify a destination. Furthermore, this concept can also be applied to other service robots tasked with delivering to unfamiliar locations, relying exclusively on user-provided textual descriptions.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10649081PMC
http://dx.doi.org/10.3390/s23218757DOI Listing

Publication Analysis

Top Keywords

textual descriptions
16
generative models
12
unknown scenes
8
service robots
8
embedding representations
8
generation text
8
textual
5
descriptions
5
study generative
4
models visual
4

Similar Publications

A scoping review protocol on brain PaCO2 levels at altitude.

PLoS One

January 2025

Division of Neurology, Department of Medicine, The Ottawa Hospital, Ottawa, Ontario, Canada.

Background: Aeromedical transfer of patients with ischemic stroke to access hyperacute stroke treatment is becoming increasingly common. Little is known about how rapid changes of altitude and atmospheric pressure can impact cerebral perfusion and ischemic burden. In patients with ischemic stroke, there is a theoretical possibility that this physiologic response of hypoxia-driven hyperventilation at higher altitude can lead to a relative drop in PaCO2.

View Article and Find Full Text PDF

Background: Self-management is regarded as a crucial factor influencing the effectiveness of home-based cardiac rehabilitation for patients with coronary heart disease. In nursing practice, nurses employ a variety of strategies to enhance self-management of patients. However, there exists a disparity in nurses' perceptions and practical experiences with these strategies.

View Article and Find Full Text PDF
Article Synopsis
  • The generation of specific molecules is important in fields like biology and drug discovery, but existing models struggle with complex customization based on detailed language.
  • TextSMOG is a new method that combines language and diffusion models to allow for text-guided small molecule generation, enhancing stability and diversity.
  • Experimental results demonstrate that TextSMOG effectively uses textual descriptions to generate 3D molecular structures tailored to complex requests.
View Article and Find Full Text PDF

Objectives: In recent years, a smartphone-based ecological momentary assessment (EMA) approach for assessing awake bruxism (AB) has attracted growing interest, both in clinical and research settings. The present study was designed to investigate subjects' experience using an EMA-based smartphone application to detect factors that could hamper or facilitate its use for clinical and research purposes.

Materials And Methods: Thirty-two patients with temporomandibular disorders (TMDs) pain (14 males, 18 females; mean age 28.

View Article and Find Full Text PDF

Cross-device and test-retest reliability of speech acoustic measurements derived from consumer-grade mobile recording devices.

Behav Res Methods

December 2024

Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China.

In recent years, there has been growing interest in remote speech assessment through automated speech acoustic analysis. While the reliability of widely used features has been validated in professional recording settings, it remains unclear how the heterogeneity of consumer-grade recording devices, commonly used in nonclinical settings, impacts the reliability of these measurements. To address this issue, we systematically investigated the cross-device and test-retest reliability of classical speech acoustic measurements in a sample of healthy Chinese adults using consumer-grade equipment across three popular speech tasks: sustained phonation (SP), diadochokinesis (DDK), and picture description (PicD).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!