A Zeroth-Order Adaptive Learning Rate Method to Reduce Cost of Hyperparameter Tuning for Deep Learning

Due to powerful data representation ability, deep learning has dramatically improved the state-of-the-art in many practical applications. However, the utility highly depends on fine-tuning of hyper-parameters, including learning rate, batch size, and network initialization. Although many first-order...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Yanan Li, Xuebin Ren, Fangyuan Zhao, Shusen Yang
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
T
Acceso en línea:https://doaj.org/article/fea104c85f094c74b56e338e92eeb8ae
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:fea104c85f094c74b56e338e92eeb8ae
record_format dspace
spelling oai:doaj.org-article:fea104c85f094c74b56e338e92eeb8ae2021-11-11T15:14:34ZA Zeroth-Order Adaptive Learning Rate Method to Reduce Cost of Hyperparameter Tuning for Deep Learning10.3390/app1121101842076-3417https://doaj.org/article/fea104c85f094c74b56e338e92eeb8ae2021-10-01T00:00:00Zhttps://www.mdpi.com/2076-3417/11/21/10184https://doaj.org/toc/2076-3417Due to powerful data representation ability, deep learning has dramatically improved the state-of-the-art in many practical applications. However, the utility highly depends on fine-tuning of hyper-parameters, including learning rate, batch size, and network initialization. Although many first-order adaptive methods (e.g., Adam, Adagrad) have been proposed to adjust learning rate based on gradients, they are susceptible to the initial learning rate and network architecture. Therefore, the main challenge of using deep learning in practice is how to reduce the cost of tuning hyper-parameters. To address this, we propose a heuristic zeroth-order learning rate method, <i>Adacomp</i>, which adaptively adjusts the learning rate based only on values of the loss function. The main idea is that Adacomp penalizes large learning rates to ensure the convergence and compensates small learning rates to accelerate the training process. Therefore, Adacomp is robust to the initial learning rate. Extensive experiments, including comparison to six typically adaptive methods (Momentum, Adagrad, RMSprop, Adadelta, Adam, and Adamax) on several benchmark datasets for image classification tasks (MNIST, KMNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100), were conducted. Experimental results show that Adacomp is not only robust to the initial learning rate but also to the network architecture, network initialization, and batch size.Yanan LiXuebin RenFangyuan ZhaoShusen YangMDPI AGarticledeep learningadaptive learning raterobustnessstochastic gradient descentTechnologyTEngineering (General). Civil engineering (General)TA1-2040Biology (General)QH301-705.5PhysicsQC1-999ChemistryQD1-999ENApplied Sciences, Vol 11, Iss 10184, p 10184 (2021)
institution DOAJ
collection DOAJ
language EN
topic deep learning
adaptive learning rate
robustness
stochastic gradient descent
Technology
T
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999
spellingShingle deep learning
adaptive learning rate
robustness
stochastic gradient descent
Technology
T
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999
Yanan Li
Xuebin Ren
Fangyuan Zhao
Shusen Yang
A Zeroth-Order Adaptive Learning Rate Method to Reduce Cost of Hyperparameter Tuning for Deep Learning
description Due to powerful data representation ability, deep learning has dramatically improved the state-of-the-art in many practical applications. However, the utility highly depends on fine-tuning of hyper-parameters, including learning rate, batch size, and network initialization. Although many first-order adaptive methods (e.g., Adam, Adagrad) have been proposed to adjust learning rate based on gradients, they are susceptible to the initial learning rate and network architecture. Therefore, the main challenge of using deep learning in practice is how to reduce the cost of tuning hyper-parameters. To address this, we propose a heuristic zeroth-order learning rate method, <i>Adacomp</i>, which adaptively adjusts the learning rate based only on values of the loss function. The main idea is that Adacomp penalizes large learning rates to ensure the convergence and compensates small learning rates to accelerate the training process. Therefore, Adacomp is robust to the initial learning rate. Extensive experiments, including comparison to six typically adaptive methods (Momentum, Adagrad, RMSprop, Adadelta, Adam, and Adamax) on several benchmark datasets for image classification tasks (MNIST, KMNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100), were conducted. Experimental results show that Adacomp is not only robust to the initial learning rate but also to the network architecture, network initialization, and batch size.
format article
author Yanan Li
Xuebin Ren
Fangyuan Zhao
Shusen Yang
author_facet Yanan Li
Xuebin Ren
Fangyuan Zhao
Shusen Yang
author_sort Yanan Li
title A Zeroth-Order Adaptive Learning Rate Method to Reduce Cost of Hyperparameter Tuning for Deep Learning
title_short A Zeroth-Order Adaptive Learning Rate Method to Reduce Cost of Hyperparameter Tuning for Deep Learning
title_full A Zeroth-Order Adaptive Learning Rate Method to Reduce Cost of Hyperparameter Tuning for Deep Learning
title_fullStr A Zeroth-Order Adaptive Learning Rate Method to Reduce Cost of Hyperparameter Tuning for Deep Learning
title_full_unstemmed A Zeroth-Order Adaptive Learning Rate Method to Reduce Cost of Hyperparameter Tuning for Deep Learning
title_sort zeroth-order adaptive learning rate method to reduce cost of hyperparameter tuning for deep learning
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/fea104c85f094c74b56e338e92eeb8ae
work_keys_str_mv AT yananli azerothorderadaptivelearningratemethodtoreducecostofhyperparametertuningfordeeplearning
AT xuebinren azerothorderadaptivelearningratemethodtoreducecostofhyperparametertuningfordeeplearning
AT fangyuanzhao azerothorderadaptivelearningratemethodtoreducecostofhyperparametertuningfordeeplearning
AT shusenyang azerothorderadaptivelearningratemethodtoreducecostofhyperparametertuningfordeeplearning
AT yananli zerothorderadaptivelearningratemethodtoreducecostofhyperparametertuningfordeeplearning
AT xuebinren zerothorderadaptivelearningratemethodtoreducecostofhyperparametertuningfordeeplearning
AT fangyuanzhao zerothorderadaptivelearningratemethodtoreducecostofhyperparametertuningfordeeplearning
AT shusenyang zerothorderadaptivelearningratemethodtoreducecostofhyperparametertuningfordeeplearning
_version_ 1718436242619629568