Background: Food image recognition, a crucial step in computational gastronomy, has diverse applications across nutritional platforms. Convolutional neural networks (CNNs) are widely used for this task due to their ability to capture hierarchical features. However, they struggle with long-range dependencies and global feature extraction, which are vital in distinguishing visually similar foods or images where the context of the whole dish is crucial, thus necessitating transformer architecture.

Objectives: This research explores the capabilities of the CNNs and transformers to build a robust classification model that can handle both short- and long-range dependencies with global features to accurately classify food images and enhance food image recognition for better nutritional analysis.

Methods: Our approach, which combines CNNs and Vision Transformers (ViTs), begins with the RestNet50 backbone model. This model is responsible for local feature extraction from the input image. The resulting feature map is then passed to the ViT encoder block, which handles further global feature extraction and classification using multi-head attention and fully connected layers with pre-trained weights.

Results: Our experiments on five diverse datasets have confirmed a superior performance compared to the current state-of-the-art methods, and our combined dataset leveraging complementary features showed enhanced generalizability and robust performance in addressing global food diversity. We used explainable techniques like grad-CAM and LIME to understand how the models made their decisions, thereby enhancing the user's trust in the proposed system. This model has been integrated into a mobile application for food recognition and nutrition analysis, offering features like an intelligent diet-tracking system.

Conclusion: This research paves the way for practical applications in personalized nutrition and healthcare, showcasing the extensive potential of AI in nutritional sciences across various dietary platforms.

Download full-text PDF

Source
http://dx.doi.org/10.3390/nu17020362DOI Listing

Publication Analysis

Top Keywords

feature extraction
12
food recognition
8
food image
8
image recognition
8
long-range dependencies
8
dependencies global
8
global feature
8
food
6
explainable cnn
4
cnn vision
4

Similar Publications

Optical techniques, such as functional near-infrared spectroscopy (fNIRS), contain high potential for the development of non-invasive wearable systems for evaluating cerebral vascular condition in aging, due to their portability and ability to monitor real-time changes in cerebral hemodynamics. In this study, thirty-six healthy adults were measured by single channel fNIRS to explore differences between two age groups using machine learning (ML). The subjects, measured during functional magnetic resonance imaging (fMRI) at Oulu University Hospital, were divided into young (age ≤ 32) and elderly (age ≥ 57) groups.

View Article and Find Full Text PDF

An automatic cervical cell classification model based on improved DenseNet121.

Sci Rep

January 2025

Department of Biomedical Engineering, School of Life Science and Technology, Changchun University of Science and Technology, Changchun, 130022, China.

The cervical cell classification technique can determine the degree of cellular abnormality and pathological condition, which can help doctors to detect the risk of cervical cancer at an early stage and improve the cure and survival rates of cervical cancer patients. Addressing the issue of low accuracy in cervical cell classification, a deep convolutional neural network A2SDNet121 is proposed. A2SDNet121 takes DenseNet121 as the backbone network.

View Article and Find Full Text PDF

A vision model for automated frozen tuna processing.

Sci Rep

January 2025

School of Food and Pharmacy, Zhejiang Ocean University, Zhoushan, 316022, People's Republic of China.

Accurate and rapid segmentation of key parts of frozen tuna, along with precise pose estimation, is crucial for automated processing. However, challenges such as size differences and indistinct features of tuna parts, as well as the complexity of determining fish poses in multi-fish scenarios, hinder this process. To address these issues, this paper introduces TunaVision, a vision model based on YOLOv8 designed for automated tuna processing.

View Article and Find Full Text PDF

Exploring the potential of advanced artificial intelligence technology in predicting microsatellite instability (MSI) and Ki-67 expression of endometrial cancer (EC) is highly significant. This study aimed to develop a novel hybrid radiomics approach integrating multiparametric magnetic resonance imaging (MRI), deep learning, and multichannel image analysis for predicting MSI and Ki-67 status. A retrospective study included 156 EC patients who were subsequently categorized into MSI and Ki-67 groups.

View Article and Find Full Text PDF

Breast cancer is one of the most aggressive types of cancer, and its early diagnosis is crucial for reducing mortality rates and ensuring timely treatment. Computer-aided diagnosis systems provide automated mammography image processing, interpretation, and grading. However, since the currently existing methods suffer from such issues as overfitting, lack of adaptability, and dependence on massive annotated datasets, the present work introduces a hybrid approach to enhance breast cancer classification accuracy.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!