NAEM: Noisy Attention Exploration Module for Deep Reinforcement Learning

Recently, deep reinforcement learning (RL) has been a hot topic due to its high capability in solving complex decision-making tasks. Although deep RL has achieved remarkable results in many fields, efficient exploration remains one of its most challenging issues. Conventional exploration heuristic m...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Zhenwen Cai, Feifei Lee, Chunyan Hu, Koji Kotani, Qiu Chen
Formato: article
Lenguaje:EN
Publicado: IEEE 2021
Materias:
Acceso en línea:https://doaj.org/article/2f304b4eae664799bacde718e8aa78b3
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Sumario:Recently, deep reinforcement learning (RL) has been a hot topic due to its high capability in solving complex decision-making tasks. Although deep RL has achieved remarkable results in many fields, efficient exploration remains one of its most challenging issues. Conventional exploration heuristic methods, such as random exploration, have proven to be not suitable for complex RL tasks along with high-dimensional states and large action spaces. In this paper, we propose a novel lightweight and general neural network module for effective global exploration, called the Noisy Attention Exploration Module (NAEM), of which the key insight is to introduce parametric and learnable Gaussian noise into the attention mechanism for global exploration. NAEM is a general structure based on the Convolutional Block Attention Module (CBAM), which retains the ability of attention to enhance feature representation for any CNNs. In order to evaluate our module, we embed it into both value-based and actor-critic RL algorithms to test their performance improvement over related agents. The experimental results show that both of the modified agents achieve a performance improvement of more than 130% on most Atari games when compared with their original versions. In addition, for the NoisyNet agents, the training time can be reduced by about 30% through using NAEM as an alternative exploration.