Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM.

Neurocomputing (Amst)

Computer Science and Engineering, University of Connecticut, Storrs, CT 06269.

Published: April 2022

Adaptive gradient methods (AGMs) have become popular in optimizing the nonconvex problems in deep learning area. We revisit AGMs and identify that the adaptive learning rate (A-LR) used by AGMs varies significantly across the dimensions of the problem over epochs (i.e., anisotropic scale), which may lead to issues in convergence and generalization. All existing modified AGMs actually represent efforts in revising the A-LR. Theoretically, we provide a new way to analyze the convergence of AGMs and prove that the convergence rate of Adam also depends on its hyper-parameter є, which has been overlooked previously. Based on these two facts, we propose a new AGM by calibrating the A-LR with an activation () function, resulting in the Sadam and SAMSGrad methods. We further prove that these algorithms enjoy better convergence speed under nonconvex, non-strongly convex, and Polyak-Łojasiewicz conditions compared with Adam. Empirical studies support our observation of the anisotropic A-LR and show that the proposed methods outperform existing AGMs and generalize even better than S-Momentum in multiple deep learning tasks.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8951388PMC
http://dx.doi.org/10.1016/j.neucom.2022.01.014DOI Listing

Publication Analysis

Top Keywords

adaptive learning
8
learning rate
8
deep learning
8
agms
6
convergence
5
calibrating adaptive
4
learning
4
rate improve
4
improve convergence
4
convergence adam
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!