With the development of artificial intelligence and deep learning technologies, image captioning has become an important research direction at the intersection of computer vision and natural language processing. The purpose of image captioning is to generate corresponding natural language descriptions by understanding the content of images. This technology has broad application prospects in fields such as image retrieval, autonomous driving, and visual question answering. Currently, many researchers have proposed region-based image captioning methods. These methods generate captions by extracting features from different regions of an image. However, they often rely on local features of the image and overlook the understanding of the overall scene, leading to captions that lack coherence and accuracy when dealing with complex scenes. Additionally, image captioning methods are unable to extract complete semantic information from visual data, which may lead to captions with biases and deficiencies. Due to these reasons, existing methods struggle to generate comprehensive and accurate captions. To fill this gap, we propose the Semantic Scenes Encoder (SSE) for image captioning. It first extracts a scene graph from the image and integrates it into the encoding of the image information. Then, it extracts a semantic graph from the captions and preserves semantic information through a learnable attention mechanism, which we refer to as the dictionary. During the generation of captions, it combines the encoded information of the image and the learned semantic information to generate complete and accurate captions. To verify the effectiveness of the SSE, we tested the model on the MSCOCO dataset. The experimental results show that the SSE improves the overall quality of the captions. The improvement in scores across multiple evaluation metrics further demonstrates that the SSE possesses significant advantages when processing identical images.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11507651 | PMC |
http://dx.doi.org/10.3390/e26100876 | DOI Listing |
Death Stud
January 2025
Program of Educational Studies, Cyprus Open University, Nicosia, Cyprus.
This article explores the application of photovoice as both an educational and research tool, examining its use to facilitate discussions on loss and grief with fifth- and sixth-grade students, as well as teachers undergoing in-service training in grief education within the Greek-Cypriot school system. Photovoice enabled participants to visually express personal stories related to loss and grief by responding to the question "What do loss and grief mean to you?." Through group discussions, participants shared insights, feelings, and narratives connected to their photographs, deepening the understanding of grief as both a shared and individual experience.
View Article and Find Full Text PDFJAMA Cardiol
January 2025
Department of Emergency Medicine, Rush University Medical Center, Chicago, Illinois.
Importance: Lung ultrasound (LUS) aids in the diagnosis of patients with dyspnea, including those with cardiogenic pulmonary edema, but requires technical proficiency for image acquisition. Previous research has demonstrated the effectiveness of artificial intelligence (AI) in guiding novice users to acquire high-quality cardiac ultrasound images, suggesting its potential for broader use in LUS.
Objective: To evaluate the ability of AI to guide acquisition of diagnostic-quality LUS images by trained health care professionals (THCPs).
Data Brief
February 2025
Estación Experimental de Aula Dei, EEAD - CSIC, Ave. Montañana 1005, 50059 Zaragoza, Spain.
The dataset [1] hosts pedological info and images of the lands -locally known as - of the outcropping gypsiferous core of the Barbastro-Balaguer anticline (Fig. 1). It stands out in the landscape for the linear reliefs due to outcrops of dipping strata with differential resistance to erosion, and also because of its whitish color (Fig.
View Article and Find Full Text PDFSensors (Basel)
December 2024
Department of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 13120, Republic of Korea.
Generating accurate and contextually rich captions for images and videos is essential for various applications, from assistive technology to content recommendation. However, challenges such as maintaining temporal coherence in videos, reducing noise in large-scale datasets, and enabling real-time captioning remain significant. We introduce MIRA-CAP (Memory-Integrated Retrieval-Augmented Captioning), a novel framework designed to address these issues through three core innovations: a cross-modal memory bank, adaptive dataset pruning, and a streaming decoder.
View Article and Find Full Text PDFJ Imaging Inform Med
January 2025
Independent Consultant, Kirkland, WA, USA.
Point-of-care ultrasound (POCUS) has emerged as a standard of care across a variety of healthcare settings due to its ability to provide critical clinical information and as well as procedural guidance to clinicians directly at the bedside. Implementation of enterprise imaging (EI) strategies is needed such that POCUS images can be appropriately captured, indexed, managed, stored, distributed, viewed, and analyzed. Because of its unique workflow and educational requirements, reliance on traditional order-based workflow solutions may be insufficient.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!