Classification of histogram-valued data with support histogram machines.

J Appl Stat

Graduate School, Department of Urban Big Data Convergence, University of Seoul, Seoul, The Republic of Korea.

Published: July 2021

The current large amounts of data and advanced technologies have produced new types of complex data, such as histogram-valued data. The paper focuses on classification problems when predictors are observed as or aggregated into histograms. Because conventional classification methods take vectors as input, a natural approach converts histograms into vector-valued data using summary values, such as the mean or median. However, this approach forgoes the distributional information available in histograms. To address this issue, we propose a margin-based classifier called support histogram machine (SHM) for histogram-valued data. We adopt the support vector machine framework and the Wasserstein-Kantorovich metric to measure distances between histograms. The proposed optimization problem is solved by a dual approach. We then test the proposed SHM via simulated and real examples and demonstrate its superior performance to summary-value-based methods.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9930853PMC
http://dx.doi.org/10.1080/02664763.2021.1947996DOI Listing

Publication Analysis

Top Keywords

histogram-valued data
12
support histogram
8
data
6
classification histogram-valued
4
data support
4
histogram machines
4
machines current
4
current large
4
large amounts
4
amounts data
4

Similar Publications

Classification of histogram-valued data with support histogram machines.

J Appl Stat

July 2021

Graduate School, Department of Urban Big Data Convergence, University of Seoul, Seoul, The Republic of Korea.

The current large amounts of data and advanced technologies have produced new types of complex data, such as histogram-valued data. The paper focuses on classification problems when predictors are observed as or aggregated into histograms. Because conventional classification methods take vectors as input, a natural approach converts histograms into vector-valued data using summary values, such as the mean or median.

View Article and Find Full Text PDF

In recent years, there has been increased interest in symbolic data analysis, including for exploratory analysis, supervised and unsupervised learning, time series analysis, etc. Traditional statistical approaches that are designed to analyze single-valued data are not suitable because they cannot incorporate the additional information on data structure available in symbolic data, and thus new techniques have been proposed for symbolic data to bridge this gap. In this article, we develop a regularized convex clustering approach for grouping histogram-valued data.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!