Handling the Imbalanced Problem in Agri-Food Data Analysis.

Foods

Department of Bioresource Engineering, McGill University, 21111 Lakeshore Road, Ste-Anne-de-Bellevue, Montreal, QC H9X 3V9, Canada.

Published: October 2024

Imbalanced data situations exist in most fields of endeavor. The problem has been identified as a major bottleneck in machine learning/data mining and is becoming a serious issue of concern in food processing applications. Inappropriate analysis of agricultural and food processing data was identified as limiting the robustness of predictive models built from agri-food applications. As a result of rare cases occurring infrequently, classification rules that detect small groups are scarce, so samples belonging to small classes are largely misclassified. Most existing machine learning algorithms including the K-means, decision trees, and support vector machines (SVMs) are not optimal in handling imbalanced data. Consequently, models developed from the analysis of such data are very prone to rejection and non-adoptability in real industrial and commercial settings. This paper showcases the reality of the imbalanced data problem in agri-food applications and therefore proposes some state-of-the-art artificial intelligence algorithm approaches for handling the problem using methods including data resampling, one-class learning, ensemble methods, feature selection, and deep learning techniques. This paper further evaluates existing and newer metrics that are well suited for handling imbalanced data. Rightly analyzing imbalanced data from food processing application research works will improve the accuracy of results and model developments. This will consequently enhance the acceptability and adoptability of innovations/inventions.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11507408PMC
http://dx.doi.org/10.3390/foods13203300DOI Listing

Publication Analysis

Top Keywords

imbalanced data
20
handling imbalanced
12
food processing
12
data
9
problem agri-food
8
agri-food applications
8
imbalanced
5
handling
4
problem
4
imbalanced problem
4

Similar Publications

Efficacy of PARPi re-maintenance therapy for recurrent ovarian cancer.

Front Oncol

January 2025

State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, China.

Objective: The current clinical data regarding the re-administration of PARPi maintenance therapy in platinum sensitive recurrent ovarian cancer (PSROC) is limited. This study aims to investigate the efficacy and associated factors of PARPi re-maintenance therapy in PSROC patients in China.

Methods: In this study, there were 201 patients with PSROC who had received maintenance therapy previously and achieved complete or partial response after platinum-based chemotherapy upon recurrence.

View Article and Find Full Text PDF

Lung cancer is a leading cause of cancer-related mortality, with disparities in incidence and outcomes observed across different racial and sex groups. Understanding the genetic factors of these disparities is critical for developing targeted treatment therapies. This study aims to identify both patient-specific and cohort-specific biomarker genes that contribute to lung cancer health disparities among African American males (AAMs), European American males (EAMs), African American females (AAFs), and European American females (EAFs).

View Article and Find Full Text PDF

Forecasting student performance with precision in the educational space is paramount for creating tailor-made interventions capable to boost learning effectiveness. It means most of the traditional student performance prediction models have difficulty in dealing with multi-dimensional academic data, can cause sub-optimal classification and generate a simple generalized insight. To address these challenges of the existing system, in this research we propose a new model Multi-dimensional Student Performance Prediction Model (MSPP) that is inspired by advanced data preprocessing and feature engineering techniques using deep learning.

View Article and Find Full Text PDF

Marine pollution due to oil spills presents major risks to coastal areas and aquatic life, leading to serious environmental health concerns. Oil Spill detection using SAR data has transitioned from traditional segmentation to a variety of machine learning & deep learning models like UNET proving its efficiency for the task. This research paper proposes a GSCAT-UNET model for efficient oil spill detection and discrimination from lookalikes.

View Article and Find Full Text PDF

In the Imbalanced Multivariate Time Series Classification (ImMTSC) task, minority-class instances typically correspond to critical events, such as system faults in power grids or abnormal health occurrences in medical monitoring. Despite being rare and random, these events are highly significant. The dynamic spatial-temporal relationships between minority-class instances and other instances make them more prone to interference from neighboring instances during classification.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!