In the theoretical analysis of deep learning, discovering which features of deep learning lead to good performance is an important task. Using the framework for analyzing the generalization error developed by Suzuki (2018), we derive a fast learning rate for deep neural networks with general activation functions. According to Suzuki (2018), scale invariance of the activation functions is essential to derive tight error bounds. While the rectified linear unit (ReLU; Nair and Hinton, 2010) satisfies scale invariance, the other famous activation functions, such as the sigmoid, the hyperbolic tangent functions, and the exponential linear unit (ELU; Clevert et al., 2016), do not satisfy this condition. The existing analysis indicates the possibility that deep learning with non scale invariant activations may have a slower convergence rate of O(1∕n) whereas with scale invariant activation functions it can reach a faster rate. In this paper, without scale invariance of activation functions, we derive the tight generalization error bound which is essentially the same as that of Suzuki (2018). From this result, at least in the framework of Suzuki (2018), we show that scale invariance of the activation functions is not essential to obtain a fast rate of convergence. We also conclude that the theoretical framework proposed by Suzuki (2018) can be widely applied to the analysis of deep learning with general activation functions.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.neunet.2020.05.033 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!