Grade Inflation in Generative Models.

ArXiv

Department of Pathology and the Division of Clinical Informatics, Department of Medicine, BIDMC and with Harvard Medical School, Boston, MA 02215.

Published: January 2025

Generative models hold great potential, but only if one can trust the evaluation of the data they generate. We show that many commonly used quality scores for comparing two-dimensional distributions of synthetic vs. ground-truth data give better results than they should, a phenomenon we call the "grade inflation problem." We show that the correlation score, Jaccard score, earth-mover's score, and Kullback-Leibler (relative-entropy) score all suffer grade inflation. We propose that any score that values all datapoints equally, as these do, will also exhibit grade inflation; we refer to such scores as "equipoint" scores. We introduce the concept of "equidensity" scores, and present the Eden score, to our knowledge the first example of such a score. We find that Eden avoids grade inflation and agrees better with human perception of goodness-of-fit than the equipoint scores above. We propose that any reasonable equidensity score will avoid grade inflation. We identify a connection between equidensity scores and Rényi entropy of negative order. We conclude that equidensity scores are likely to outperform equipoint scores for generative models, and for comparing low-dimensional distributions more generally.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11722526PMC

Publication Analysis

Top Keywords

grade inflation
20
generative models
12
scores
8
score
8
equipoint scores
8
equidensity scores
8
grade
5
inflation
5
inflation generative
4
models generative
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!