A Generic Improvement to Deep Residual Networks Based on Gradient Flow.

IEEE Trans Neural Netw Learn Syst

Published: July 2020

Preactivation ResNets consistently outperforms the original postactivation ResNets on the CIFAR10/100 classification benchmark. However, these results surprisingly do not carry over to the standard ImageNet benchmark. First, we theoretically analyze this incongruity in terms of how the two variants differ in handling the propagation of gradients. Although identity shortcuts are critical in both variants for improving optimization and performance, we show that postactivation variants enable early layers to receive a diverse dynamic composition of gradients from effectively deeper paths in comparison to preactivation variants, enabling the network to make maximal use of its representational capacity. Second, we show that downsampling projections (while only a few in number) have a significantly detrimental effect on performance. We show that by simply replacing downsampling projections with identitylike dense-reshape shortcuts, the classification results of standard residual architectures such as ResNets, ResNeXts, and SE-Nets improve by up to 1.2% on ImageNet, without any increase in computational complexity (FLOPs).

Download full-text PDF	Source
http://dx.doi.org/10.1109/TNNLS.2019.2929198	DOI Listing

Publication Analysis

Top Keywords

downsampling projections

generic improvement

improvement deep

deep residual

residual networks

networks based

based gradient

gradient flow

flow preactivation

preactivation resnets

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!