Spiking Neural Network Discovers Energy-Efficient Hexapod Motion in Deep Reinforcement Learning

In Deep Reinforcement Learning (DRL) for robotics application, it is important to find energy-efficient motions. For this purpose, a standard method is to set an action penalty in the reward to find the optimal motion considering the energy expenditure. This method is widely used for the simplicity...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Katsumi Naya, Kyo Kutsuzawa, Dai Owaki, Mitsuhiro Hayashibe
Formato:	article
Lenguaje:	EN
Publicado:	IEEE 2021
Materias:	Spiking neural network deep reinforcement learning energy efficiency hexapod gait spatio-temporal backpropagation Electrical engineering. Electronics. Nuclear engineering TK1-9971
Acceso en línea:	https://doaj.org/article/335681a7613746e5a4b027d670d343bf
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:335681a7613746e5a4b027d670d343bf
record_format	dspace
spelling	oai:doaj.org-article:335681a7613746e5a4b027d670d343bf2021-11-18T00:08:52ZSpiking Neural Network Discovers Energy-Efficient Hexapod Motion in Deep Reinforcement Learning2169-353610.1109/ACCESS.2021.3126311https://doaj.org/article/335681a7613746e5a4b027d670d343bf2021-01-01T00:00:00Zhttps://ieeexplore.ieee.org/document/9606760/https://doaj.org/toc/2169-3536In Deep Reinforcement Learning (DRL) for robotics application, it is important to find energy-efficient motions. For this purpose, a standard method is to set an action penalty in the reward to find the optimal motion considering the energy expenditure. This method is widely used for the simplicity of implementation. However, since the reward is a linear sum, if the penalty is too large, the system will fall into local minima and no moving solution can be obtained. In contrast, if the penalty is too small, the effect may not be sufficient. Therefore, it is necessary to adjust the amount of the penalty so that the agent always moves dynamically, and the energy-saving effect is sufficient. Nevertheless, since adjusting the hyperparameters is computationally expensive, we need a learning method that is robust to the penalty setting problem. We investigated on the Spiking Neural Network (SNN), which has been attracting attention for its computational efficiency and neuromorphic architecture. We conducted gait experiments using a hexapod agent while varying the energy penalty settings in the simulation environment. By applying SNN to the conventional state-of-the-art DRL algorithms, we examined whether the agent could explore for an optimal gait with a larger penalty variation and obtain an energy-efficient gait verified with Cost of Transport (CoT), a metric of energy efficiency for gait. Soft Actor-Critic (SAC)+SNN resulted in a CoT of 1.64, Twin Delayed Deep Deterministic policy gradient (TD3)+SNN resulted in a CoT of 2.21, and Deep Deterministic policy gradient (DDPG)+SNN resulted in a CoT of 2.08 (1.91 for normal SAC, 2.38 for TD3, and 2.40 for DDPG). DRL combined with SNN succeeded in learning more energy efficient gait with lower CoT.Katsumi NayaKyo KutsuzawaDai OwakiMitsuhiro HayashibeIEEEarticleSpiking neural networkdeep reinforcement learningenergy efficiencyhexapod gaitspatio-temporal backpropagationElectrical engineering. Electronics. Nuclear engineeringTK1-9971ENIEEE Access, Vol 9, Pp 150345-150354 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Spiking neural network deep reinforcement learning energy efficiency hexapod gait spatio-temporal backpropagation Electrical engineering. Electronics. Nuclear engineering TK1-9971
spellingShingle	Spiking neural network deep reinforcement learning energy efficiency hexapod gait spatio-temporal backpropagation Electrical engineering. Electronics. Nuclear engineering TK1-9971 Katsumi Naya Kyo Kutsuzawa Dai Owaki Mitsuhiro Hayashibe Spiking Neural Network Discovers Energy-Efficient Hexapod Motion in Deep Reinforcement Learning
description	In Deep Reinforcement Learning (DRL) for robotics application, it is important to find energy-efficient motions. For this purpose, a standard method is to set an action penalty in the reward to find the optimal motion considering the energy expenditure. This method is widely used for the simplicity of implementation. However, since the reward is a linear sum, if the penalty is too large, the system will fall into local minima and no moving solution can be obtained. In contrast, if the penalty is too small, the effect may not be sufficient. Therefore, it is necessary to adjust the amount of the penalty so that the agent always moves dynamically, and the energy-saving effect is sufficient. Nevertheless, since adjusting the hyperparameters is computationally expensive, we need a learning method that is robust to the penalty setting problem. We investigated on the Spiking Neural Network (SNN), which has been attracting attention for its computational efficiency and neuromorphic architecture. We conducted gait experiments using a hexapod agent while varying the energy penalty settings in the simulation environment. By applying SNN to the conventional state-of-the-art DRL algorithms, we examined whether the agent could explore for an optimal gait with a larger penalty variation and obtain an energy-efficient gait verified with Cost of Transport (CoT), a metric of energy efficiency for gait. Soft Actor-Critic (SAC)+SNN resulted in a CoT of 1.64, Twin Delayed Deep Deterministic policy gradient (TD3)+SNN resulted in a CoT of 2.21, and Deep Deterministic policy gradient (DDPG)+SNN resulted in a CoT of 2.08 (1.91 for normal SAC, 2.38 for TD3, and 2.40 for DDPG). DRL combined with SNN succeeded in learning more energy efficient gait with lower CoT.
format	article
author	Katsumi Naya Kyo Kutsuzawa Dai Owaki Mitsuhiro Hayashibe
author_facet	Katsumi Naya Kyo Kutsuzawa Dai Owaki Mitsuhiro Hayashibe
author_sort	Katsumi Naya
title	Spiking Neural Network Discovers Energy-Efficient Hexapod Motion in Deep Reinforcement Learning
title_short	Spiking Neural Network Discovers Energy-Efficient Hexapod Motion in Deep Reinforcement Learning
title_full	Spiking Neural Network Discovers Energy-Efficient Hexapod Motion in Deep Reinforcement Learning
title_fullStr	Spiking Neural Network Discovers Energy-Efficient Hexapod Motion in Deep Reinforcement Learning
title_full_unstemmed	Spiking Neural Network Discovers Energy-Efficient Hexapod Motion in Deep Reinforcement Learning
title_sort	spiking neural network discovers energy-efficient hexapod motion in deep reinforcement learning
publisher	IEEE
publishDate	2021
url	https://doaj.org/article/335681a7613746e5a4b027d670d343bf
work_keys_str_mv	AT katsuminaya spikingneuralnetworkdiscoversenergyefficienthexapodmotionindeepreinforcementlearning AT kyokutsuzawa spikingneuralnetworkdiscoversenergyefficienthexapodmotionindeepreinforcementlearning AT daiowaki spikingneuralnetworkdiscoversenergyefficienthexapodmotionindeepreinforcementlearning AT mitsuhirohayashibe spikingneuralnetworkdiscoversenergyefficienthexapodmotionindeepreinforcementlearning
_version_	1718425215926534144

Spiking Neural Network Discovers Energy-Efficient Hexapod Motion in Deep Reinforcement Learning

Ejemplares similares