MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments

This paper is an initial endeavor to bridge the gap between powerful Deep Reinforcement Learning methodologies and the problem of exploration/coverage of unknown terrains. Within this scope, MarsExplorer, an openai-gym compatible environment tailored to exploration/coverage of unknown areas, is pres...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Dimitrios I. Koutras, Athanasios C. Kapoutsis, Angelos A. Amanatiadis, Elias B. Kosmatopoulos
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
Acceso en línea:https://doaj.org/article/89b0f32561004c3ea99cdb0421579ba6
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:89b0f32561004c3ea99cdb0421579ba6
record_format dspace
spelling oai:doaj.org-article:89b0f32561004c3ea99cdb0421579ba62021-11-25T17:24:21ZMarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments10.3390/electronics102227512079-9292https://doaj.org/article/89b0f32561004c3ea99cdb0421579ba62021-11-01T00:00:00Zhttps://www.mdpi.com/2079-9292/10/22/2751https://doaj.org/toc/2079-9292This paper is an initial endeavor to bridge the gap between powerful Deep Reinforcement Learning methodologies and the problem of exploration/coverage of unknown terrains. Within this scope, MarsExplorer, an openai-gym compatible environment tailored to exploration/coverage of unknown areas, is presented. MarsExplorer translates the original robotics problem into a Reinforcement Learning setup that various off-the-shelf algorithms can tackle. Any learned policy can be straightforwardly applied to a robotic platform without an elaborate simulation model of the robot’s dynamics to apply a different learning/adaptation phase. One of its core features is the controllable multi-dimensional procedural generation of terrains, which is the key for producing policies with strong generalization capabilities. Four different state-of-the-art RL algorithms (A3C, PPO, Rainbow, and SAC) are trained on the MarsExplorer environment, and a proper evaluation of their results compared to the average human-level performance is reported. In the follow-up experimental analysis, the effect of the multi-dimensional difficulty setting on the learning capabilities of the best-performing algorithm (PPO) is analyzed. A milestone result is the generation of an exploration policy that follows the Hilbert curve without providing this information to the environment or rewarding directly or indirectly Hilbert-curve-like trajectories. The experimental analysis is concluded by evaluating PPO learned policy algorithm side-by-side with frontier-based exploration strategies. A study on the performance curves revealed that PPO-based policy was capable of performing adaptive-to-the-unknown-terrain sweeping without leaving expensive-to-revisit areas uncovered, underlying the capability of RL-based methodologies to tackle exploration tasks efficiently.Dimitrios I. KoutrasAthanasios C. KapoutsisAngelos A. AmanatiadisElias B. KosmatopoulosMDPI AGarticleDeep Reinforcement LearningOpenAI gymexplorationunknown terrainsElectronicsTK7800-8360ENElectronics, Vol 10, Iss 2751, p 2751 (2021)
institution DOAJ
collection DOAJ
language EN
topic Deep Reinforcement Learning
OpenAI gym
exploration
unknown terrains
Electronics
TK7800-8360
spellingShingle Deep Reinforcement Learning
OpenAI gym
exploration
unknown terrains
Electronics
TK7800-8360
Dimitrios I. Koutras
Athanasios C. Kapoutsis
Angelos A. Amanatiadis
Elias B. Kosmatopoulos
MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments
description This paper is an initial endeavor to bridge the gap between powerful Deep Reinforcement Learning methodologies and the problem of exploration/coverage of unknown terrains. Within this scope, MarsExplorer, an openai-gym compatible environment tailored to exploration/coverage of unknown areas, is presented. MarsExplorer translates the original robotics problem into a Reinforcement Learning setup that various off-the-shelf algorithms can tackle. Any learned policy can be straightforwardly applied to a robotic platform without an elaborate simulation model of the robot’s dynamics to apply a different learning/adaptation phase. One of its core features is the controllable multi-dimensional procedural generation of terrains, which is the key for producing policies with strong generalization capabilities. Four different state-of-the-art RL algorithms (A3C, PPO, Rainbow, and SAC) are trained on the MarsExplorer environment, and a proper evaluation of their results compared to the average human-level performance is reported. In the follow-up experimental analysis, the effect of the multi-dimensional difficulty setting on the learning capabilities of the best-performing algorithm (PPO) is analyzed. A milestone result is the generation of an exploration policy that follows the Hilbert curve without providing this information to the environment or rewarding directly or indirectly Hilbert-curve-like trajectories. The experimental analysis is concluded by evaluating PPO learned policy algorithm side-by-side with frontier-based exploration strategies. A study on the performance curves revealed that PPO-based policy was capable of performing adaptive-to-the-unknown-terrain sweeping without leaving expensive-to-revisit areas uncovered, underlying the capability of RL-based methodologies to tackle exploration tasks efficiently.
format article
author Dimitrios I. Koutras
Athanasios C. Kapoutsis
Angelos A. Amanatiadis
Elias B. Kosmatopoulos
author_facet Dimitrios I. Koutras
Athanasios C. Kapoutsis
Angelos A. Amanatiadis
Elias B. Kosmatopoulos
author_sort Dimitrios I. Koutras
title MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments
title_short MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments
title_full MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments
title_fullStr MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments
title_full_unstemmed MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments
title_sort marsexplorer: exploration of unknown terrains via deep reinforcement learning and procedurally generated environments
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/89b0f32561004c3ea99cdb0421579ba6
work_keys_str_mv AT dimitriosikoutras marsexplorerexplorationofunknownterrainsviadeepreinforcementlearningandprocedurallygeneratedenvironments
AT athanasiosckapoutsis marsexplorerexplorationofunknownterrainsviadeepreinforcementlearningandprocedurallygeneratedenvironments
AT angelosaamanatiadis marsexplorerexplorationofunknownterrainsviadeepreinforcementlearningandprocedurallygeneratedenvironments
AT eliasbkosmatopoulos marsexplorerexplorationofunknownterrainsviadeepreinforcementlearningandprocedurallygeneratedenvironments
_version_ 1718412420154654720