A Zeroth-Order Adaptive Learning Rate Method to Reduce Cost of Hyperparameter Tuning for Deep Learning
Due to powerful data representation ability, deep learning has dramatically improved the state-of-the-art in many practical applications. However, the utility highly depends on fine-tuning of hyper-parameters, including learning rate, batch size, and network initialization. Although many first-order...
Guardado en:
Autores principales: | , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
MDPI AG
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/fea104c85f094c74b56e338e92eeb8ae |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:fea104c85f094c74b56e338e92eeb8ae |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:fea104c85f094c74b56e338e92eeb8ae2021-11-11T15:14:34ZA Zeroth-Order Adaptive Learning Rate Method to Reduce Cost of Hyperparameter Tuning for Deep Learning10.3390/app1121101842076-3417https://doaj.org/article/fea104c85f094c74b56e338e92eeb8ae2021-10-01T00:00:00Zhttps://www.mdpi.com/2076-3417/11/21/10184https://doaj.org/toc/2076-3417Due to powerful data representation ability, deep learning has dramatically improved the state-of-the-art in many practical applications. However, the utility highly depends on fine-tuning of hyper-parameters, including learning rate, batch size, and network initialization. Although many first-order adaptive methods (e.g., Adam, Adagrad) have been proposed to adjust learning rate based on gradients, they are susceptible to the initial learning rate and network architecture. Therefore, the main challenge of using deep learning in practice is how to reduce the cost of tuning hyper-parameters. To address this, we propose a heuristic zeroth-order learning rate method, <i>Adacomp</i>, which adaptively adjusts the learning rate based only on values of the loss function. The main idea is that Adacomp penalizes large learning rates to ensure the convergence and compensates small learning rates to accelerate the training process. Therefore, Adacomp is robust to the initial learning rate. Extensive experiments, including comparison to six typically adaptive methods (Momentum, Adagrad, RMSprop, Adadelta, Adam, and Adamax) on several benchmark datasets for image classification tasks (MNIST, KMNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100), were conducted. Experimental results show that Adacomp is not only robust to the initial learning rate but also to the network architecture, network initialization, and batch size.Yanan LiXuebin RenFangyuan ZhaoShusen YangMDPI AGarticledeep learningadaptive learning raterobustnessstochastic gradient descentTechnologyTEngineering (General). Civil engineering (General)TA1-2040Biology (General)QH301-705.5PhysicsQC1-999ChemistryQD1-999ENApplied Sciences, Vol 11, Iss 10184, p 10184 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
deep learning adaptive learning rate robustness stochastic gradient descent Technology T Engineering (General). Civil engineering (General) TA1-2040 Biology (General) QH301-705.5 Physics QC1-999 Chemistry QD1-999 |
spellingShingle |
deep learning adaptive learning rate robustness stochastic gradient descent Technology T Engineering (General). Civil engineering (General) TA1-2040 Biology (General) QH301-705.5 Physics QC1-999 Chemistry QD1-999 Yanan Li Xuebin Ren Fangyuan Zhao Shusen Yang A Zeroth-Order Adaptive Learning Rate Method to Reduce Cost of Hyperparameter Tuning for Deep Learning |
description |
Due to powerful data representation ability, deep learning has dramatically improved the state-of-the-art in many practical applications. However, the utility highly depends on fine-tuning of hyper-parameters, including learning rate, batch size, and network initialization. Although many first-order adaptive methods (e.g., Adam, Adagrad) have been proposed to adjust learning rate based on gradients, they are susceptible to the initial learning rate and network architecture. Therefore, the main challenge of using deep learning in practice is how to reduce the cost of tuning hyper-parameters. To address this, we propose a heuristic zeroth-order learning rate method, <i>Adacomp</i>, which adaptively adjusts the learning rate based only on values of the loss function. The main idea is that Adacomp penalizes large learning rates to ensure the convergence and compensates small learning rates to accelerate the training process. Therefore, Adacomp is robust to the initial learning rate. Extensive experiments, including comparison to six typically adaptive methods (Momentum, Adagrad, RMSprop, Adadelta, Adam, and Adamax) on several benchmark datasets for image classification tasks (MNIST, KMNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100), were conducted. Experimental results show that Adacomp is not only robust to the initial learning rate but also to the network architecture, network initialization, and batch size. |
format |
article |
author |
Yanan Li Xuebin Ren Fangyuan Zhao Shusen Yang |
author_facet |
Yanan Li Xuebin Ren Fangyuan Zhao Shusen Yang |
author_sort |
Yanan Li |
title |
A Zeroth-Order Adaptive Learning Rate Method to Reduce Cost of Hyperparameter Tuning for Deep Learning |
title_short |
A Zeroth-Order Adaptive Learning Rate Method to Reduce Cost of Hyperparameter Tuning for Deep Learning |
title_full |
A Zeroth-Order Adaptive Learning Rate Method to Reduce Cost of Hyperparameter Tuning for Deep Learning |
title_fullStr |
A Zeroth-Order Adaptive Learning Rate Method to Reduce Cost of Hyperparameter Tuning for Deep Learning |
title_full_unstemmed |
A Zeroth-Order Adaptive Learning Rate Method to Reduce Cost of Hyperparameter Tuning for Deep Learning |
title_sort |
zeroth-order adaptive learning rate method to reduce cost of hyperparameter tuning for deep learning |
publisher |
MDPI AG |
publishDate |
2021 |
url |
https://doaj.org/article/fea104c85f094c74b56e338e92eeb8ae |
work_keys_str_mv |
AT yananli azerothorderadaptivelearningratemethodtoreducecostofhyperparametertuningfordeeplearning AT xuebinren azerothorderadaptivelearningratemethodtoreducecostofhyperparametertuningfordeeplearning AT fangyuanzhao azerothorderadaptivelearningratemethodtoreducecostofhyperparametertuningfordeeplearning AT shusenyang azerothorderadaptivelearningratemethodtoreducecostofhyperparametertuningfordeeplearning AT yananli zerothorderadaptivelearningratemethodtoreducecostofhyperparametertuningfordeeplearning AT xuebinren zerothorderadaptivelearningratemethodtoreducecostofhyperparametertuningfordeeplearning AT fangyuanzhao zerothorderadaptivelearningratemethodtoreducecostofhyperparametertuningfordeeplearning AT shusenyang zerothorderadaptivelearningratemethodtoreducecostofhyperparametertuningfordeeplearning |
_version_ |
1718436242619629568 |