Despite the widespread success of deep learning in various applications, neural network theory has been lagging behind. The choice of the activation function plays a critical role in the expressivity of a neural network but for reasons that are not yet fully understood. While the rectified linear unit (ReLU) is currently one of the most popular activation functions, ReLU squared has only recently been empirically shown to be pivotal in producing consistently superior results for state-of-the-art deep learning tasks (So et al.
View Article and Find Full Text PDF