Captioning is the process of assembling a description for an image. Previous research on captioning has usually focused on foreground objects. In captioning concepts, there are two main objects for discussion: background object and foreground object. In contrast to the previous image-captioning research, generating captions from the geological images of rocks is more focused on the background of the images. This study proposed image captioning using a convolutional neural network, long short-term memory, and word2vec to generate words from the image. The proposed model was constructed by a convolutional neural network (CNN), long short-term memory (LSTM), and word2vec and gave a dense output of 256 units. To make it properly grammatical, a sequence of predicted words was reconstructed into a sentence by the beam search algorithm with K = 3. An evaluation of the pre-trained baseline model VGG16 and our proposed CNN-A, CNN-B, CNN-C, and CNN-D models used BLEU score methods for the N-gram. The BLEU scores achieved for BLEU-1 using these models were 0.5515, 0.6463, 0.7012, 0.7620, and 0.5620, respectively. BLEU-2 showed scores of 0.6048, 0.6507, 0.7083, 0.8756, and 0.6578, respectively. BLEU-3 performed with scores of 0.6414, 0.6892, 0.7312, 0.8861, and 0.7307, respectively. Finally, BLEU-4 had scores of 0.6526, 0.6504, 0.7345, 0.8250, and 0.7537, respectively. Our CNN-C model outperformed the other models, especially the baseline model. Furthermore, there are several future challenges in studying captions, such as geological sentence structure, geological sentence phrase, and constructing words by a geological tagger.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9693370PMC
http://dx.doi.org/10.3390/jimaging8110294DOI Listing

Publication Analysis

Top Keywords

generating captions
8
captions geological
8
convolutional neural
8
neural network
8
long short-term
8
short-term memory
8
baseline model
8
geological sentence
8
geological
5
hybrid deep
4

Similar Publications

Academic data processing is crucial in scientometrics and bibliometrics, such as research trending analysis and citation recommendation. Existing datasets in this domain have predominantly concentrated on textual data, overlooking the importance of visual elements. To bridge this gap, we introduce a multidisciplinary multimodal aligned dataset (MMAD) specifically designed for academic data processing.

View Article and Find Full Text PDF

Importance: Lung ultrasound (LUS) aids in the diagnosis of patients with dyspnea, including those with cardiogenic pulmonary edema, but requires technical proficiency for image acquisition. Previous research has demonstrated the effectiveness of artificial intelligence (AI) in guiding novice users to acquire high-quality cardiac ultrasound images, suggesting its potential for broader use in LUS.

Objective: To evaluate the ability of AI to guide acquisition of diagnostic-quality LUS images by trained health care professionals (THCPs).

View Article and Find Full Text PDF

Particulate matter and potentially toxic element content in urban ornamental plant species to assess pollutants trapping capacity.

J Environ Manage

February 2025

Department of Plant Biology and Ecology, University of Seville, Avda. Reina Mercedes S/n, Apartado de Correos, 1095, 41012, Sevilla, Spain. Electronic address:

Urban environments are usually polluted by anthropogenic activities like traffic, a major source of potentially toxic elements (PTEs), and ornamental plant species may reduce contamination by trapping traffic-related air pollutants in their leaves. The purpose of this study was tested the trapping pollutant capacity of four species commonly used in green areas of Seville city (SW Spain) to better choose species in urban green planning. Composition of particulate matter (PM) obtained from foliar surfaces (sPM) and wax-included (wPM) was determined by EDX-SEM analysis in samples from different city locations.

View Article and Find Full Text PDF

Generating accurate and contextually rich captions for images and videos is essential for various applications, from assistive technology to content recommendation. However, challenges such as maintaining temporal coherence in videos, reducing noise in large-scale datasets, and enabling real-time captioning remain significant. We introduce MIRA-CAP (Memory-Integrated Retrieval-Augmented Captioning), a novel framework designed to address these issues through three core innovations: a cross-modal memory bank, adaptive dataset pruning, and a streaming decoder.

View Article and Find Full Text PDF

The COVID-19 pandemic provided an ideal scenario for studying the care of the elderly population, we implemented a tool named the Geriatric Measure (GM) tool to determine the severity and need for hospitalization. The objective of the study is to evaluate if the results of a brief Geriatric Measure tool are associated with mortality and other outcomes among older adults with COVID-19 treated in the emergency department. Retrospective observational cohort study.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!