Agricultural case studies of classification accuracy, spectral resolution, and model over-fitting.

Appl Spectrosc

University of Western Australia, School of Animal Biology, UWA Institute of Agriculture, 35 Stirling Highway, Crawley, Perth, Western Australia 6009, Australia.

Published: November 2013

This paper describes the relationship between spectral resolution and classification accuracy in analyses of hyperspectral imaging data acquired from crop leaves. The main scope is to discuss and reduce the risk of model over-fitting. Over-fitting of a classification model occurs when too many and/or irrelevant model terms are included (i.e., a large number of spectral bands), and it may lead to low robustness/repeatability when the classification model is applied to independent validation data. We outline a simple way to quantify the level of model over-fitting by comparing the observed classification accuracies with those obtained from explanatory random data. Hyperspectral imaging data were acquired from two crop-insect pest systems: (1) potato psyllid (Bactericera cockerelli) infestations of individual bell pepper plants (Capsicum annuum) with the acquisition of hyperspectral imaging data under controlled-light conditions (data set 1), and (2) sugarcane borer (Diatraea saccharalis) infestations of individual maize plants (Zea mays) with the acquisition of hyperspectral imaging data from the same plants under two markedly different image-acquisition conditions (data sets 2a and b). For each data set, reflectance data were analyzed based on seven spectral resolutions by dividing 160 spectral bands from 405 to 907 nm into 4, 16, 32, 40, 53, 80, or 160 bands. In the two data sets, similar classification results were obtained with spectral resolutions ranging from 3.1 to 12.6 nm. Thus, the size of the initial input data could be reduced fourfold with only a negligible loss of classification accuracy. In the analysis of data set 1, several validation approaches all demonstrated consistently that insect-induced stress could be accurately detected and that therefore there was little indication of model over-fitting. In the analyses of data set 2, inconsistent validation results were obtained and the observed classification accuracy (81.06%) was only a few percentage points above that obtained using random data (66.7-77.4%). Thus, our analysis highlights a potential risk of model over-fitting and emphasizes the importance of testing for this important aspect as part of developing reliable and robust classification models.

Download full-text PDF

Source
http://dx.doi.org/10.1366/12-06933DOI Listing

Publication Analysis

Top Keywords

model over-fitting
20
classification accuracy
16
hyperspectral imaging
16
imaging data
16
data set
16
data
15
classification
9
spectral resolution
8
model
8
data acquired
8

Similar Publications

Motivation: Fine-mapping aims to prioritize causal variants underlying complex traits by accounting for the linkage disequilibrium of GWAS risk locus. The expanding resources of functional annotations serve as auxiliary evidence to improve the power of fine-mapping. However, existing fine-mapping methods tend to generate many false positive results when integrating a large number of annotations.

View Article and Find Full Text PDF

Transformer based models for time-series forecasting have shown promising performance and during the past few years different Transformer variants have been proposed in time-series forecasting domain. However, most of the existing methods, mainly represent the time-series from a single scale, making it challenging to capture various time granularities or ignore inter-series correlations between the series which might lead to inaccurate forecasts. In this paper, we address the above mentioned shortcomings and propose a Transformer based model which integrates multi-scale patch-wise temporal modeling and channel-wise representation.

View Article and Find Full Text PDF

Artificial Intelligence techniques are being used to analyse vast amounts of medical data and assist in the accurate and early diagnosis of diseases. The common brain related diseases are faced by most of the people which affects the structure and function of the brain. Artificial neural networks have been extensively used for disease prediction and diagnosis due to their ability to learn complex patterns and relationships from large datasets.

View Article and Find Full Text PDF

Remote photo-plethysmography (rPPG) is a useful camera-based health motioning method that can measure the heart rhythm from facial videos. Many well-established deep learning models can provide highly accurate and robust results in measuring heart rate (HR) and heart rate variability (HRV). However, these methods are unable to effectively eliminate illumination variation and motion artifact disturbances, and their substantial computational resource requirements significantly limit their applicability in real-world scenarios.

View Article and Find Full Text PDF

The fast improvement of cyberattacks in the area of the Internet of Things (IoT) presents novel safety challenges to zero-day attacks. Intrusion detection systems (IDS) are generally focused on exact attacks to defend the use of IoT. However, the attacks were unidentified, for IDS still signifies tasks and concerns about consumers' data privacy and safety.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!