Image captioning is a technique used to generate descriptive captions for images. Typically, it involves employing a Convolutional Neural Network (CNN) as the encoder to extract visual features, and a decoder model, often based on Recurrent Neural Networks (RNNs), to generate the captions. Recently, the encoder-decoder architecture has witnessed the widespread adoption of the self-attention mechanism. However, this approach faces certain challenges that require further research. One such challenge is that the extracted visual features do not fully exploit the available image information, primarily due to the absence of semantic concepts. This limitation restricts the ability to fully comprehend the content depicted in the image. To address this issue, we present a new image-Transformer-based model boosted with image object semantic representation. Our model incorporates semantic representation in encoder attention, enhancing visual features by integrating instance-level concepts. Additionally, we employ Transformer as the decoder in the language generation module. By doing so, we achieve improved performance in generating accurate and diverse captions. We evaluated the performance of our model on the MS-COCO and novel MACE datasets. The results illustrate that our model aligns with state-of-the-art approaches in terms of caption generation.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10975165 | PMC |
http://dx.doi.org/10.3390/s24061796 | DOI Listing |
J Magn Reson Imaging
January 2025
Department of Radiology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
Osteoarthritis (OA) is heterogeneous and involves structural changes in the whole joint, such as cartilage, meniscus/labrum, ligaments, and tendons, mainly with short T2 relaxation times. Detecting OA before the onset of irreversible changes is crucial for early proactive management and limit growing disease burden. The more recent advanced quantitative imaging techniques and deep learning (DL) algorithms in musculoskeletal imaging have shown great potential for visualizing "pre-OA.
View Article and Find Full Text PDFJ Ophthalmic Inflamm Infect
January 2025
School of medicine, Shahid Sadoughi University of Medical sciences, Yazd, Iran.
Introduction: Infectious keratitis is a rare but devastating complication following photorefractive keratectomy (PRK) that may lead to visual impairment. This study assessed the clinical features, treatment strategies, and outcomes of post-PRK infectious keratitis.
Methods: This retrospective study was conducted on patients with post-PRK infectious keratitis presenting to Khalili Hospital, Shiraz, Iran, from June 2011 to March 2024.
World J Urol
January 2025
Department of Urology, Renmin Hospital of Wuhan University, 99 Zhang Zhi-dong Road, Wuhan, Hubei, 430060, P.R. China.
Purpose: To develop a deep learning (DL) model based on primary tumor tissue to predict the lymph node metastasis (LNM) status of muscle invasive bladder cancer (MIBC), while validating the prognostic value of the predicted aiN score in MIBC patients.
Methods: A total of 323 patients from The Cancer Genome Atlas (TCGA) were used as the training and internal validation set, with image features extracted using a visual encoder called UNI. We investigated the ability to predict LNM status while assessing the prognostic value of aiN score.
Radiologie (Heidelb)
January 2025
Klinik für diagnostische und interventionelle Neuroradiologie, Universitätskliniken des Saarlandes, Kirrberger Str., 66421, Homburg Saar, Deutschland.
Performance: Spontaneous dissections of the cerebral arteries are among the leading causes of stroke in young adults. They result from hemorrhage into the outer layers of the arterial wall, which can lead to stenosis or even complete vessel occlusion. Clinical presentations vary, ranging from localized pain to cerebral ischemic complications.
View Article and Find Full Text PDFElife
January 2025
Department of Psychology, Queens University, Kingston, Canada.
Movie-watching is a central aspect of our lives and an important paradigm for understanding the brain mechanisms behind cognition as it occurs in daily life. Contemporary views of ongoing thought argue that the ability to make sense of events in the 'here and now' depend on the neural processing of incoming sensory information by auditory and visual cortex, which are kept in check by systems in association cortex. However, we currently lack an understanding of how patterns of ongoing thoughts map onto the different brain systems when we watch a film, partly because methods of sampling experience disrupt the dynamics of brain activity and the experience of movie-watching.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!