This paper presents a novel natural gradient and Hessian-free (NGHF) optimisation framework for neural network training that can operate efficiently in a distributed manner. It relies on the linear conjugate gradient (CG) algorithm to combine the natural gradient (NG) method with local curvature information from Hessian-free (HF). A solution to a numerical issue in CG allows effective parameter updates to be generated with far fewer CG iterations than usually used (e.
View Article and Find Full Text PDF