CoolMomentum: a method for stochastic optimization by Langevin dynamics with simulated annealing

Abstract Deep learning applications require global optimization of non-convex objective functions, which have multiple local minima. The same problem is often found in physical simulations and may be resolved by the methods of Langevin dynamics with Simulated Annealing, which is a well-established a...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Oleksandr Borysenko, Maksym Byshkin
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/9ae814904dc54f65a9749af760cb5d73
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:9ae814904dc54f65a9749af760cb5d73
record_format dspace
spelling oai:doaj.org-article:9ae814904dc54f65a9749af760cb5d732021-12-02T15:52:55ZCoolMomentum: a method for stochastic optimization by Langevin dynamics with simulated annealing10.1038/s41598-021-90144-32045-2322https://doaj.org/article/9ae814904dc54f65a9749af760cb5d732021-05-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-90144-3https://doaj.org/toc/2045-2322Abstract Deep learning applications require global optimization of non-convex objective functions, which have multiple local minima. The same problem is often found in physical simulations and may be resolved by the methods of Langevin dynamics with Simulated Annealing, which is a well-established approach for minimization of many-particle potentials. This analogy provides useful insights for non-convex stochastic optimization in machine learning. Here we find that integration of the discretized Langevin equation gives a coordinate updating rule equivalent to the famous Momentum optimization algorithm. As a main result, we show that a gradual decrease of the momentum coefficient from the initial value close to unity until zero is equivalent to application of Simulated Annealing or slow cooling, in physical terms. Making use of this novel approach, we propose CoolMomentum—a new stochastic optimization method. Applying Coolmomentum to optimization of Resnet-20 on Cifar-10 dataset and Efficientnet-B0 on Imagenet, we demonstrate that it is able to achieve high accuracies.Oleksandr BorysenkoMaksym ByshkinNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-8 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Oleksandr Borysenko
Maksym Byshkin
CoolMomentum: a method for stochastic optimization by Langevin dynamics with simulated annealing
description Abstract Deep learning applications require global optimization of non-convex objective functions, which have multiple local minima. The same problem is often found in physical simulations and may be resolved by the methods of Langevin dynamics with Simulated Annealing, which is a well-established approach for minimization of many-particle potentials. This analogy provides useful insights for non-convex stochastic optimization in machine learning. Here we find that integration of the discretized Langevin equation gives a coordinate updating rule equivalent to the famous Momentum optimization algorithm. As a main result, we show that a gradual decrease of the momentum coefficient from the initial value close to unity until zero is equivalent to application of Simulated Annealing or slow cooling, in physical terms. Making use of this novel approach, we propose CoolMomentum—a new stochastic optimization method. Applying Coolmomentum to optimization of Resnet-20 on Cifar-10 dataset and Efficientnet-B0 on Imagenet, we demonstrate that it is able to achieve high accuracies.
format article
author Oleksandr Borysenko
Maksym Byshkin
author_facet Oleksandr Borysenko
Maksym Byshkin
author_sort Oleksandr Borysenko
title CoolMomentum: a method for stochastic optimization by Langevin dynamics with simulated annealing
title_short CoolMomentum: a method for stochastic optimization by Langevin dynamics with simulated annealing
title_full CoolMomentum: a method for stochastic optimization by Langevin dynamics with simulated annealing
title_fullStr CoolMomentum: a method for stochastic optimization by Langevin dynamics with simulated annealing
title_full_unstemmed CoolMomentum: a method for stochastic optimization by Langevin dynamics with simulated annealing
title_sort coolmomentum: a method for stochastic optimization by langevin dynamics with simulated annealing
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/9ae814904dc54f65a9749af760cb5d73
work_keys_str_mv AT oleksandrborysenko coolmomentumamethodforstochasticoptimizationbylangevindynamicswithsimulatedannealing
AT maksymbyshkin coolmomentumamethodforstochasticoptimizationbylangevindynamicswithsimulatedannealing
_version_ 1718385586923896832