MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments
This paper is an initial endeavor to bridge the gap between powerful Deep Reinforcement Learning methodologies and the problem of exploration/coverage of unknown terrains. Within this scope, MarsExplorer, an openai-gym compatible environment tailored to exploration/coverage of unknown areas, is pres...
Guardado en:
Autores principales: | , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
MDPI AG
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/89b0f32561004c3ea99cdb0421579ba6 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:89b0f32561004c3ea99cdb0421579ba6 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:89b0f32561004c3ea99cdb0421579ba62021-11-25T17:24:21ZMarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments10.3390/electronics102227512079-9292https://doaj.org/article/89b0f32561004c3ea99cdb0421579ba62021-11-01T00:00:00Zhttps://www.mdpi.com/2079-9292/10/22/2751https://doaj.org/toc/2079-9292This paper is an initial endeavor to bridge the gap between powerful Deep Reinforcement Learning methodologies and the problem of exploration/coverage of unknown terrains. Within this scope, MarsExplorer, an openai-gym compatible environment tailored to exploration/coverage of unknown areas, is presented. MarsExplorer translates the original robotics problem into a Reinforcement Learning setup that various off-the-shelf algorithms can tackle. Any learned policy can be straightforwardly applied to a robotic platform without an elaborate simulation model of the robot’s dynamics to apply a different learning/adaptation phase. One of its core features is the controllable multi-dimensional procedural generation of terrains, which is the key for producing policies with strong generalization capabilities. Four different state-of-the-art RL algorithms (A3C, PPO, Rainbow, and SAC) are trained on the MarsExplorer environment, and a proper evaluation of their results compared to the average human-level performance is reported. In the follow-up experimental analysis, the effect of the multi-dimensional difficulty setting on the learning capabilities of the best-performing algorithm (PPO) is analyzed. A milestone result is the generation of an exploration policy that follows the Hilbert curve without providing this information to the environment or rewarding directly or indirectly Hilbert-curve-like trajectories. The experimental analysis is concluded by evaluating PPO learned policy algorithm side-by-side with frontier-based exploration strategies. A study on the performance curves revealed that PPO-based policy was capable of performing adaptive-to-the-unknown-terrain sweeping without leaving expensive-to-revisit areas uncovered, underlying the capability of RL-based methodologies to tackle exploration tasks efficiently.Dimitrios I. KoutrasAthanasios C. KapoutsisAngelos A. AmanatiadisElias B. KosmatopoulosMDPI AGarticleDeep Reinforcement LearningOpenAI gymexplorationunknown terrainsElectronicsTK7800-8360ENElectronics, Vol 10, Iss 2751, p 2751 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Deep Reinforcement Learning OpenAI gym exploration unknown terrains Electronics TK7800-8360 |
spellingShingle |
Deep Reinforcement Learning OpenAI gym exploration unknown terrains Electronics TK7800-8360 Dimitrios I. Koutras Athanasios C. Kapoutsis Angelos A. Amanatiadis Elias B. Kosmatopoulos MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments |
description |
This paper is an initial endeavor to bridge the gap between powerful Deep Reinforcement Learning methodologies and the problem of exploration/coverage of unknown terrains. Within this scope, MarsExplorer, an openai-gym compatible environment tailored to exploration/coverage of unknown areas, is presented. MarsExplorer translates the original robotics problem into a Reinforcement Learning setup that various off-the-shelf algorithms can tackle. Any learned policy can be straightforwardly applied to a robotic platform without an elaborate simulation model of the robot’s dynamics to apply a different learning/adaptation phase. One of its core features is the controllable multi-dimensional procedural generation of terrains, which is the key for producing policies with strong generalization capabilities. Four different state-of-the-art RL algorithms (A3C, PPO, Rainbow, and SAC) are trained on the MarsExplorer environment, and a proper evaluation of their results compared to the average human-level performance is reported. In the follow-up experimental analysis, the effect of the multi-dimensional difficulty setting on the learning capabilities of the best-performing algorithm (PPO) is analyzed. A milestone result is the generation of an exploration policy that follows the Hilbert curve without providing this information to the environment or rewarding directly or indirectly Hilbert-curve-like trajectories. The experimental analysis is concluded by evaluating PPO learned policy algorithm side-by-side with frontier-based exploration strategies. A study on the performance curves revealed that PPO-based policy was capable of performing adaptive-to-the-unknown-terrain sweeping without leaving expensive-to-revisit areas uncovered, underlying the capability of RL-based methodologies to tackle exploration tasks efficiently. |
format |
article |
author |
Dimitrios I. Koutras Athanasios C. Kapoutsis Angelos A. Amanatiadis Elias B. Kosmatopoulos |
author_facet |
Dimitrios I. Koutras Athanasios C. Kapoutsis Angelos A. Amanatiadis Elias B. Kosmatopoulos |
author_sort |
Dimitrios I. Koutras |
title |
MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments |
title_short |
MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments |
title_full |
MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments |
title_fullStr |
MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments |
title_full_unstemmed |
MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments |
title_sort |
marsexplorer: exploration of unknown terrains via deep reinforcement learning and procedurally generated environments |
publisher |
MDPI AG |
publishDate |
2021 |
url |
https://doaj.org/article/89b0f32561004c3ea99cdb0421579ba6 |
work_keys_str_mv |
AT dimitriosikoutras marsexplorerexplorationofunknownterrainsviadeepreinforcementlearningandprocedurallygeneratedenvironments AT athanasiosckapoutsis marsexplorerexplorationofunknownterrainsviadeepreinforcementlearningandprocedurallygeneratedenvironments AT angelosaamanatiadis marsexplorerexplorationofunknownterrainsviadeepreinforcementlearningandprocedurallygeneratedenvironments AT eliasbkosmatopoulos marsexplorerexplorationofunknownterrainsviadeepreinforcementlearningandprocedurallygeneratedenvironments |
_version_ |
1718412420154654720 |