Camera-based passive dietary intake monitoring is able to continuously capture the eating episodes of a subject, recording rich visual information, such as the type and volume of food being consumed, as well as the eating behaviors of the subject. However, there currently is no method that is able to incorporate these visual clues and provide a comprehensive context of dietary intake from passive recording (e.g., is the subject sharing food with others, what food the subject is eating, and how much food is left in the bowl). On the other hand, privacy is a major concern while egocentric wearable cameras are used for capturing. In this article, we propose a privacy-preserved secure solution (i.e., egocentric image captioning) for dietary assessment with passive monitoring, which unifies food recognition, volume estimation, and scene understanding. By converting images into rich text descriptions, nutritionists can assess individual dietary intake based on the captions instead of the original images, reducing the risk of privacy leakage from images. To this end, an egocentric dietary image captioning dataset has been built, which consists of in-the-wild images captured by head-worn and chest-worn cameras in field studies in Ghana. A novel transformer-based architecture is designed to caption egocentric dietary images. Comprehensive experiments have been conducted to evaluate the effectiveness and to justify the design of the proposed architecture for egocentric dietary image captioning. To the best of our knowledge, this is the first work that applies image captioning for dietary intake assessment in real-life settings.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TCYB.2023.3243999 | DOI Listing |
J Clin Med
November 2024
Department of Neurosurgery, College of Medicine, The University of Tennessee Health Sciences, Memphis, TN 38163, USA.
Lumbar spinal stenosis (LSS) is a major cause of chronic lower back and leg pain, and is traditionally diagnosed through labor-intensive analysis of magnetic resonance imaging (MRI) scans by radiologists. This study aims to streamline the diagnostic process by developing an automated radiology report generation (ARRG) system using a vision-language (VL) model. We utilized a Generative Image-to-Text (GIT) model, originally designed for visual question answering (VQA) and image captioning.
View Article and Find Full Text PDFBody Image
December 2024
North Yorkshire County Council, North Yorkshire, UK.
Appearance-related content is ubiquitous across highly visual social media platforms, in both imagery and text. The present study aims to explore the content of text-based interactions initiated by self-images on Instagram. Seventeen adolescent girls from the UK (Age M = 15.
View Article and Find Full Text PDFJAMA Otolaryngol Head Neck Surg
December 2024
Department of Otolaryngology-Head and Neck Surgery, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada.
Importance: Diagnosis of pediatric ankyloglossia and other oral ties is increasing in part due to social media, leading to more frenotomies and excess medicalization of often normal anatomy.
Objective: To assess the accuracy and readability of social media content on pediatric ankyloglossia and other oral ties.
Design, Setting, And Participants: In this cross-sectional study, the top 200 posts on an image-based social media platform tagged with #tonguetie, #liptie, or #buccaltie were collected using a de novo account on March 27, 2023.
Disabil Rehabil Assist Technol
December 2024
Department of Informatics, Universidade Federal de Viçosa - UFV, Viçosa, Brazil.
Background: Existing image description methods when used as Assistive Technologies often fall short in meeting the needs of blind or low vision (BLV) individuals. They tend to either compress all visual elements into brief captions, create disjointed sentences for each image region, or provide extensive descriptions.
Purpose: To address these limitations, we introduce VIIDA, a procedure aimed at the Visually Impaired which implements an Image Description Approach, focusing on webinar scenes.
Comput Struct Biotechnol J
December 2024
Department of Electrical and Computer Engineering, University of Houston, United States.
In the rapidly evolving landscape of medical imaging, the integration of artificial intelligence (AI) with clinical expertise offers unprecedented opportunities to enhance diagnostic precision and accuracy. Yet, the "black box" nature of AI models often limits their integration into clinical practice, where transparency and interpretability are important. This paper presents a novel system leveraging the Large Multimodal Model (LMM) to bridge the gap between AI predictions and the cognitive processes of radiologists.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!