AdaCN: An Adaptive Cubic Newton Method for Nonconvex Stochastic Optimization

In this work, we introduce AdaCN, a novel adaptive cubic Newton method for nonconvex stochastic optimization. AdaCN dynamically captures the curvature of the loss landscape by diagonally approximated Hessian plus the norm of difference between previous two estimates. It only requires at most first o...

Description complète

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Yan Liu, Maojun Zhang, Zhiwei Zhong, Xiangrong Zeng
Format:	article
Langue:	EN
Publié:	Hindawi Limited 2021
Sujets:	Computer applications to medicine. Medical informatics R858-859.7 Neurosciences. Biological psychiatry. Neuropsychiatry RC321-571
Accès en ligne:	https://doaj.org/article/09154e3ff5c64b9a8f24e212323ae4d8
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

Description
Résumé:	In this work, we introduce AdaCN, a novel adaptive cubic Newton method for nonconvex stochastic optimization. AdaCN dynamically captures the curvature of the loss landscape by diagonally approximated Hessian plus the norm of difference between previous two estimates. It only requires at most first order gradients and updates with linear complexity for both time and memory. In order to reduce the variance introduced by the stochastic nature of the problem, AdaCN hires the first and second moment to implement and exponential moving average on iteratively updated stochastic gradients and approximated stochastic Hessians, respectively. We validate AdaCN in extensive experiments, showing that it outperforms other stochastic first order methods (including SGD, Adam, and AdaBound) and stochastic quasi-Newton method (i.e., Apollo), in terms of both convergence speed and generalization performance.

AdaCN: An Adaptive Cubic Newton Method for Nonconvex Stochastic Optimization

Documents similaires