The distance that compares the difference between two probability distributions plays a fundamental role in statistics and machine learning. Optimal transport (OT) theory provides a theoretical framework to study such distances. Recent advances in OT theory include a generalization of classical OT with an extra entropic constraint or regularization, called entropic OT. Despite its convenience in computation, entropic OT still lacks sufficient theoretical support. In this paper, we show that the quadratic cost in entropic OT can be upper-bounded using entropy power inequality (EPI)-type bounds. First, we prove an HWI-type inequality by making use of the infinitesimal displacement convexity of the OT map. Second, we derive two Talagrand-type inequalities using the saturation of EPI that corresponds to a numerical term in our expressions. These two new inequalities are shown to generalize two previous results obtained by Bolley et al. and Bai et al. Using the new Talagrand-type inequalities, we also show that the geometry observed by Sinkhorn distance is smoothed in the sense of measure concentration. Finally, we corroborate our results with various simulation studies.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8871052 | PMC |
http://dx.doi.org/10.3390/e24020306 | DOI Listing |
J Chem Theory Comput
January 2025
Department of Polymer Materials and Engineering, College of Materials and Metallurgy, Guizhou University, Guiyang 550025, P. R. China.
Missing data in tabular data sets is ubiquitous in statistical analysis, big data analysis, and machine learning studies. Many strategies have been proposed to impute missing data, but their reliability has not been stringently assessed in materials science. Here, we carried out a benchmark test for six imputation strategies: Mean, MissForest, HyperImpute, Gain, Sinkhorn, and a newly proposed MatImpute on seven representative data sets in materials science.
View Article and Find Full Text PDFThis article addresses the challenge of scale variations in crowd-counting problems from a multidimensional measure-theoretic perspective. We start by formulating crowd counting as a measure-matching problem, based on the assumption that discrete measures can express the scattered ground truth and the predicted density map. In this context, we introduce the Sinkhorn counting loss and extend it to the semi-balanced form, which alleviates the problems including entropic bias, distance destruction, and amount constraints.
View Article and Find Full Text PDFR Soc Open Sci
July 2024
School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia.
Single-cell technologies allow us to gain insights into cellular processes at unprecedented resolution. In stem cell and developmental biology snapshot data allow us to characterize how the transcriptional states of cells change between successive cell types. Here, we show how approximate Bayesian computation (ABC) can be employed to calibrate mathematical models against single-cell data.
View Article and Find Full Text PDFSpectrochim Acta A Mol Biomol Spectrosc
December 2024
R&D Center, China Tobacco Yunnan Industrial Co., Ltd, No. 367 Hongjin Road, Kunming 650231, China.
Due to the high-dimensionality, redundancy, and non-linearity of the near-infrared (NIR) spectra data, as well as the influence of attributes such as producing area and grade of the sample, which can all affect the similarity measure between samples. This paper proposed a t-distributed stochastic neighbor embedding algorithm based on Sinkhorn distance (St-SNE) combined with multi-attribute data information. Firstly, the Sinkhorn distance was introduced which can solve problems such as KL divergence asymmetry and sparse data distribution in high-dimensional space, thereby constructing probability distributions that make low-dimensional space similar to high-dimensional space.
View Article and Find Full Text PDFJ Chem Theory Comput
July 2024
Department of Chemistry and Pharmaceutical Sciences, Amsterdam Institute of Molecular and Life Sciences (AIMMS), Faculty of Science, Vrije Universiteit Amsterdam, De Boelelaan 1083, 1081HV Amsterdam, The Netherlands.
Understanding the character of electronic excitations is important in computational and reaction mechanistic studies, but their classification from simulations remains an open problem. Distances based on optimal transport have proven very useful in a plethora of classification problems and, therefore, seem a natural tool to try to tackle this challenge. We propose and investigate a new diagnostic Θ based on the Sinkhorn divergence from optimal transport.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!