Outlier modeling for spectral data reduction.

J Opt Soc Am A Opt Image Sci Vis

Published: July 2014

The spectra in spectral reflectance datasets tend to be quite correlated and therefore they can be represented more compactly using standard techniques such as principal components analysis (PCA) as part of a lossy compression strategy. However, the presence of outlier spectra can often increase the overall error of the reconstructed spectra. This paper introduces a new outlier modeling (OM) method that detects, clusters, and separately models outliers with their own set of basis vectors. Outliers are defined in terms of the robust Mahalanobis distance using the fast minimum covariance determinant algorithm as a robust estimator of the multivariate mean and covariance from which it is computed. After removing the outliers from the main dataset, the performance of PCA on the remaining data improves significantly; however, since outlier spectra are a part of the image, they cannot simply be ignored. The solution is to cluster the outliers into a small number of clusters and then model each cluster separately using its own cluster-specific PCA-derived bases. Tests show that OM leads to lower spectral reconstruction errors of reflectance spectra in terms of both normalized RMS and goodness of fit.

Download full-text PDF

Source
http://dx.doi.org/10.1364/JOSAA.31.001445DOI Listing

Publication Analysis

Top Keywords

outlier modeling
8
outlier spectra
8
spectra
5
outlier
4
modeling spectral
4
spectral data
4
data reduction
4
reduction spectra
4
spectra spectral
4
spectral reflectance
4

Similar Publications

Introduction: Accurate and consistent data play a critical role in enabling health officials to make informed decisions regarding emerging trends in SARS-CoV-2 infections. Alongside traditional indicators such as the 7-day-incidence rate, wastewater-based epidemiology can provide valuable insights into SARS-CoV-2 concentration changes. However, the wastewater compositions and wastewater systems are rather complex.

View Article and Find Full Text PDF

Background: Sepsis is a life-threatening organ dysfunction condition produced by dysregulation of the host response to infection. It is now characterized by a high clinical morbidity and mortality rate, endangering patients' lives and health. The purpose of this study was to determine the value of Long chain non-coding RNA (LncRNA) RP3_508I15.

View Article and Find Full Text PDF

Background: Rheumatology has experienced notable changes in the last decades. New drugs, including biologic agents and Janus kinase (JAK) inhibitors, have blossomed. Concepts such as window of opportunity, arthralgia suspicious for progression, or difficult-to-treat rheumatoid arthritis (RA) have appeared; and new management approaches and strategies such as treat-to-target have become popular.

View Article and Find Full Text PDF

Establishing age-group specific reference intervals of human salivary proteome and its preliminary application for epilepsy diagnosis.

Sci China Life Sci

December 2024

State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China.

Salivary proteins serve multifaceted roles in maintaining oral health and hold significant potential for diagnosing and monitoring diseases due to the non-invasive nature of saliva sampling. However, the clinical utility of current saliva biomarker studies is limited by the lack of reference intervals (RIs) to correctly interpret the testing result. Here, we developed a rapid and robust saliva proteome profiling workflow, obtaining coverage of >1,200 proteins from a 50-µL unstimulated salivary flow with 30 min gradients.

View Article and Find Full Text PDF

This study investigates the impact of outliers on the evolution of clusters in temporal data-sets. Monitoring and tracing cluster transitions of temporal data sets allow us to observe how clusters evolve and change over time. By tracking the movement of data points between clusters, we can gain insights into the underlying patterns, trends, and dynamics of the data.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!