Background: Regularized regression methods such as principal component or partial least squares regression perform well in learning tasks on high dimensional spectral data, but cannot explicitly eliminate irrelevant features. The random forest classifier with its associated Gini feature importance, on the other hand, allows for an explicit feature elimination, but may not be optimally adapted to spectral data due to the topology of its constituent classification trees which are based on orthogonal splits in feature space.

Results: We propose to combine the best of both approaches, and evaluated the joint use of a feature selection based on a recursive feature elimination using the Gini importance of random forests' together with regularized classification methods on spectral data sets from medical diagnostics, chemotaxonomy, biomedical analytics, food science, and synthetically modified spectral data. Here, a feature selection using the Gini feature importance with a regularized classification by discriminant partial least squares regression performed as well as or better than a filtering according to different univariate statistical tests, or using regression coefficients in a backward feature elimination. It outperformed the direct application of the random forest classifier, or the direct application of the regularized classifiers on the full set of features.

Conclusion: The Gini importance of the random forest provided superior means for measuring feature relevance on spectral data, but - on an optimal subset of features - the regularized classifiers might be preferable over the random forest classifier, in spite of their limitation to model linear dependencies only. A feature selection based on Gini importance, however, may precede a regularized linear classification to identify this optimal subset of features, and to earn a double benefit of both dimensionality reduction and the elimination of noise from the classification task.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2724423PMC
http://dx.doi.org/10.1186/1471-2105-10-213DOI Listing

Publication Analysis

Top Keywords

spectral data
24
random forest
20
feature selection
16
forest classifier
12
feature elimination
12
feature
11
partial squares
8
squares regression
8
gini feature
8
selection based
8

Similar Publications

New applications such as the Internet of Things, autonomous driving, Industry X.0 and many more will transmit sensitive information via fibers and over the air with envisioned data rates beyond terabits per second. Therefore, the encryption has to be simple, fast and spectrally efficient, so that the power consumption and latency are low and the scarce bandwidth is not wasted.

View Article and Find Full Text PDF

In spectral analysis, selecting the right spectral variables is crucial for effective modeling. It reduces data dimensionality, removes irrelevant wavelength points, and improves both the generalization ability and computational efficiency of the model. However, the number of available samples often falls short of the total possible combinations of wavelengths, making variable selection a non-deterministic polynomial-time (NP) hard optimization problem.

View Article and Find Full Text PDF

AI integration into wavelength-based SPR biosensing: Advancements in spectroscopic analysis and detection.

Anal Chim Acta

March 2025

Artificial Intelligence Research Center, Chang Gung University, Taoyuan, 333323, Taiwan; Department of Artificial Intelligence, College of Intelligent Computing, Chang Gung University, Taoyuan, 333323, Taiwan. Electronic address:

Background: In recent years, employing deep learning methods in the biosensing area has significantly reduced data analysis time and enhanced data interpretation and prediction accuracy. In some SPR fields, research teams have further enhanced detection capabilities using deep learning techniques. However, the application of deep learning to spectroscopic surface plasmon resonance (SPR) biosensors has not been reported.

View Article and Find Full Text PDF

Non-optically active water quality parameters (NAWQPs) are essential for surface water quality assessments, although automated monitoring methods are time-consuming, include labor-intensive chemical pretreatment, and pose challenges for high spatiotemporal resolution monitoring. Advancements in spectroscopic techniques and machine learning may address these issues. We integrated ultraviolet-visible-near infrared absorption spectroscopy with physical-chemical measurements to predict total nitrogen (TN), dissolved oxygen (DO), and total phosphorus (TP) in the Yangtze River Basin, China.

View Article and Find Full Text PDF

Monitoring wetland cover changes and land surface temperatures using remote sensing and GIS in Göksu Delta.

Integr Environ Assess Manag

January 2025

Faculty of Fine Arts, Design and Architecture Department of Landscape Architecture, Tekirdağ Namık Kemal University, Tekirdağ, Türkiye.

Wetlands provide necessary ecosystem services, such as climate regulation and contribution to biodiversity at global and local scales, and they face spatial changes due to natural and anthropogenic factors. The degradation of the characteristic structure signals potential severe threats to biodiversity. This study aimed to monitor the long-term spatial changes of the Göksu Delta, a critical Ramsar site, using remote sensing techniques.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!