Hybrid of Deep Learning and Word Embedding in Generating Captions: Image-Captioning Solution for Geological Rock Images.

Agus Nursikuwagus Rinaldi Munir Masayu Leylia Khodra

J Imaging

Department of Informatics, School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Jl. Ganesha No.10, Bandung 40132, Indonesia.

Published: October 2022

Captioning is the process of assembling a description for an image. Previous research on captioning has usually focused on foreground objects. In captioning concepts, there are two main objects for discussion: background object and foreground object. In contrast to the previous image-captioning research, generating captions from the geological images of rocks is more focused on the background of the images. This study proposed image captioning using a convolutional neural network, long short-term memory, and word2vec to generate words from the image. The proposed model was constructed by a convolutional neural network (CNN), long short-term memory (LSTM), and word2vec and gave a dense output of 256 units. To make it properly grammatical, a sequence of predicted words was reconstructed into a sentence by the beam search algorithm with K = 3. An evaluation of the pre-trained baseline model VGG16 and our proposed CNN-A, CNN-B, CNN-C, and CNN-D models used BLEU score methods for the N-gram. The BLEU scores achieved for BLEU-1 using these models were 0.5515, 0.6463, 0.7012, 0.7620, and 0.5620, respectively. BLEU-2 showed scores of 0.6048, 0.6507, 0.7083, 0.8756, and 0.6578, respectively. BLEU-3 performed with scores of 0.6414, 0.6892, 0.7312, 0.8861, and 0.7307, respectively. Finally, BLEU-4 had scores of 0.6526, 0.6504, 0.7345, 0.8250, and 0.7537, respectively. Our CNN-C model outperformed the other models, especially the baseline model. Furthermore, there are several future challenges in studying captions, such as geological sentence structure, geological sentence phrase, and constructing words by a geological tagger.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9693370	PMC
http://dx.doi.org/10.3390/jimaging8110294	DOI Listing

Publication Analysis

Top Keywords

generating captions

captions geological

convolutional neural

neural network

long short-term

short-term memory

baseline model

geological sentence

geological

hybrid deep

Similar Publications

A Multidisciplinary Multimodal Aligned Dataset for Academic Data Processing.

Sci Data

January 2025

Shanghai Artificial Intelligence Research Institute Co., Ltd., Shanghai, 200240, China.

Haitao Song Hongyi Xu Zikai Wang Yifan Wang Jiajia Li

Academic data processing is crucial in scientometrics and bibliometrics, such as research trending analysis and citation recommendation. Existing datasets in this domain have predominantly concentrated on textual data, overlooking the importance of visual elements. To bridge this gap, we introduce a multidisciplinary multimodal aligned dataset (MMAD) specifically designed for academic data processing.

View Article and Find Full Text PDF

Similar Publications

Artificial Intelligence-Guided Lung Ultrasound by Nonexperts.

JAMA Cardiol

January 2025

Department of Emergency Medicine, Rush University Medical Center, Chicago, Illinois.

Cristiana Baloescu John Bailitz Baljash Cheema Ravi Agarwala Madeline Jankowski

Importance: Lung ultrasound (LUS) aids in the diagnosis of patients with dyspnea, including those with cardiogenic pulmonary edema, but requires technical proficiency for image acquisition. Previous research has demonstrated the effectiveness of artificial intelligence (AI) in guiding novice users to acquire high-quality cardiac ultrasound images, suggesting its potential for broader use in LUS.

Objective: To evaluate the ability of AI to guide acquisition of diagnostic-quality LUS images by trained health care professionals (THCPs).

View Article and Find Full Text PDF

Similar Publications

Particulate matter and potentially toxic element content in urban ornamental plant species to assess pollutants trapping capacity.

J Environ Manage

February 2025

Department of Plant Biology and Ecology, University of Seville, Avda. Reina Mercedes S/n, Apartado de Correos, 1095, 41012, Sevilla, Spain. Electronic address:

B Miralles-Pérez C Andrés Camacho A J Fernández-Espinosa S Rossini-Oliva

Urban environments are usually polluted by anthropogenic activities like traffic, a major source of potentially toxic elements (PTEs), and ornamental plant species may reduce contamination by trapping traffic-related air pollutants in their leaves. The purpose of this study was tested the trapping pollutant capacity of four species commonly used in green areas of Seville city (SW Spain) to better choose species in urban green planning. Composition of particulate matter (PM) obtained from foliar surfaces (sPM) and wax-included (wPM) was determined by EDX-SEM analysis in samples from different city locations.

View Article and Find Full Text PDF

Similar Publications

MIRA-CAP: Memory-Integrated Retrieval-Augmented Captioning for State-of-the-Art Image and Video Captioning.

Sensors (Basel)

December 2024

Department of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 13120, Republic of Korea.

Sabina Umirzakova Shakhnoza Muksimova Sevara Mardieva Murodjon Sultanov Baxtiyarovich Young-Im Cho

Generating accurate and contextually rich captions for images and videos is essential for various applications, from assistive technology to content recommendation. However, challenges such as maintaining temporal coherence in videos, reducing noise in large-scale datasets, and enabling real-time captioning remain significant. We introduce MIRA-CAP (Memory-Integrated Retrieval-Augmented Captioning), a novel framework designed to address these issues through three core innovations: a cross-modal memory bank, adaptive dataset pruning, and a streaming decoder.

View Article and Find Full Text PDF

Similar Publications

Association between a geriatric measure tool and adverse outcomes among older adults treated in an emergency department: a retrospective cohort study.

Intern Emerg Med

January 2025

Emergency Department, National Institute of Medical Sciences and Nutrition Salvador Zubiran, Avenida Vasco de Quiróga No. 15, Colonia Belisario Domínguez Sección XVI, Alcaldía Tlalpan, CP 14080, Mexico City, Mexico.

Carolina Gómez-Moreno Alan Alexis Chacón-Corral Ayari Pérez-Méndez Ashuin Kammar-García Corina Ortega-Ortiz

The COVID-19 pandemic provided an ideal scenario for studying the care of the elderly population, we implemented a tool named the Geriatric Measure (GM) tool to determine the severity and need for hospitalization. The objective of the study is to evaluate if the results of a brief Geriatric Measure tool are associated with mortality and other outcomes among older adults with COVID-19 treated in the emergency department. Retrospective observational cohort study.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!