This paper investigates the approximation properties of deep neural networks with piecewise-polynomial activation functions. We derive the required depth, width, and sparsity of a deep neural network to approximate any Hölder smooth function up to a given approximation error in Hölder norms in such a way that all weights of this neural network are bounded by 1. The latter feature is essential to control generalization errors in many statistical and machine learning applications.
View Article and Find Full Text PDF