Insights into Object Semantics: Leveraging Transformer Networks for Advanced Image Captioning.

Sensors (Basel)

School of Computer Science, University of Lincoln, Lincoln LN6 7TS, UK.

Published: March 2024

Image captioning is a technique used to generate descriptive captions for images. Typically, it involves employing a Convolutional Neural Network (CNN) as the encoder to extract visual features, and a decoder model, often based on Recurrent Neural Networks (RNNs), to generate the captions. Recently, the encoder-decoder architecture has witnessed the widespread adoption of the self-attention mechanism. However, this approach faces certain challenges that require further research. One such challenge is that the extracted visual features do not fully exploit the available image information, primarily due to the absence of semantic concepts. This limitation restricts the ability to fully comprehend the content depicted in the image. To address this issue, we present a new image-Transformer-based model boosted with image object semantic representation. Our model incorporates semantic representation in encoder attention, enhancing visual features by integrating instance-level concepts. Additionally, we employ Transformer as the decoder in the language generation module. By doing so, we achieve improved performance in generating accurate and diverse captions. We evaluated the performance of our model on the MS-COCO and novel MACE datasets. The results illustrate that our model aligns with state-of-the-art approaches in terms of caption generation.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10975165	PMC
http://dx.doi.org/10.3390/s24061796	DOI Listing

Publication Analysis

Top Keywords

visual features

image captioning

semantic representation

image

model

insights object

object semantics

semantics leveraging

leveraging transformer

transformer networks

Similar Publications

Visualizing Preosteoarthritis: Updates on UTE-Based Compositional MRI and Deep Learning Algorithms.

J Magn Reson Imaging

January 2025

Department of Radiology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.

Dong Sun Gang Wu Wei Zhang Nadeer M Gharaibeh Xiaoming Li

Osteoarthritis (OA) is heterogeneous and involves structural changes in the whole joint, such as cartilage, meniscus/labrum, ligaments, and tendons, mainly with short T2 relaxation times. Detecting OA before the onset of irreversible changes is crucial for early proactive management and limit growing disease burden. The more recent advanced quantitative imaging techniques and deep learning (DL) algorithms in musculoskeletal imaging have shown great potential for visualizing "pre-OA.

View Article and Find Full Text PDF

Similar Publications

Infectious keratitis following photorefractive keratectomy: a 13-year study at a tertiary center.

J Ophthalmic Inflamm Infect

January 2025

School of medicine, Shahid Sadoughi University of Medical sciences, Yazd, Iran.

Alireza Attar Hossein Jamali Julio Ortega-Usobiaga Golnoush Mahmoudinezhad Dagny Zhu

Introduction: Infectious keratitis is a rare but devastating complication following photorefractive keratectomy (PRK) that may lead to visual impairment. This study assessed the clinical features, treatment strategies, and outcomes of post-PRK infectious keratitis.

Methods: This retrospective study was conducted on patients with post-PRK infectious keratitis presenting to Khalili Hospital, Shiraz, Iran, from June 2011 to March 2024.

View Article and Find Full Text PDF

Similar Publications

Deep learning-based lymph node metastasis status predicts prognosis from muscle-invasive bladder cancer histopathology.

World J Urol

January 2025

Department of Urology, Renmin Hospital of Wuhan University, 99 Zhang Zhi-dong Road, Wuhan, Hubei, 430060, P.R. China.

Qingyuan Zheng Panpan Jiao Rui Yang Junjie Fan Yunxun Liu

Purpose: To develop a deep learning (DL) model based on primary tumor tissue to predict the lymph node metastasis (LNM) status of muscle invasive bladder cancer (MIBC), while validating the prognostic value of the predicted aiN score in MIBC patients.

Methods: A total of 323 patients from The Cancer Genome Atlas (TCGA) were used as the training and internal validation set, with image features extracted using a visual encoder called UNI. We investigated the ability to predict LNM status while assessing the prognostic value of aiN score.

View Article and Find Full Text PDF

Similar Publications

[Spontaneous craniocervical dissection].

Radiologie (Heidelb)

January 2025

Klinik für diagnostische und interventionelle Neuroradiologie, Universitätskliniken des Saarlandes, Kirrberger Str., 66421, Homburg Saar, Deutschland.

Malvina Garner

Performance: Spontaneous dissections of the cerebral arteries are among the leading causes of stroke in young adults. They result from hemorrhage into the outer layers of the arterial wall, which can lead to stenosis or even complete vessel occlusion. Clinical presentations vary, ranging from localized pain to cerebral ischemic complications.

View Article and Find Full Text PDF

Similar Publications

Mapping patterns of thought onto brain activity during movie-watching.

Elife

January 2025

Department of Psychology, Queens University, Kingston, Canada.

Raven Star Wallace Bronte Mckeown Ian Goodall-Halliwell Louis Chitiz Philippe Forest

Movie-watching is a central aspect of our lives and an important paradigm for understanding the brain mechanisms behind cognition as it occurs in daily life. Contemporary views of ongoing thought argue that the ability to make sense of events in the 'here and now' depend on the neural processing of incoming sensory information by auditory and visual cortex, which are kept in check by systems in association cortex. However, we currently lack an understanding of how patterns of ongoing thoughts map onto the different brain systems when we watch a film, partly because methods of sampling experience disrupt the dynamics of brain activity and the experience of movie-watching.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!