Quantization approximates a deep network model with floating-point numbers by the model with low bit width numbers, thereby accelerating inference and reducing computation. Zero-shot quantization, which aims to quantize a model without access to the original data, can be achieved by fitting the real data distribution through data synthesis. However, it has been observed that zero-shot quantization leads to inferior performance compared to post-training quantization with real data for two primary reasons: 1) a normal generator has difficulty obtaining a high diversity of synthetic data since it lacks long-range information to allocate attention to global features, and 2) synthetic images aim to simulate the statistics of real data, which leads to weak intra-class heterogeneity and limited feature richness.
View Article and Find Full Text PDF