Optimizing agent behavior over long time scales by transporting value
People are able to mentally time travel to distant memories and reflect on the consequences of those past events. Here, the authors show how a mechanism that connects learning from delayed rewards with memory retrieval can enable AI agents to discover links between past events to help decide better...
Guardado en:
Autores principales: | , , , , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Nature Portfolio
2019
|
Materias: | |
Acceso en línea: | https://doaj.org/article/2e805e7216b343f9a8b03a4f5ce02aad |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:2e805e7216b343f9a8b03a4f5ce02aad |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:2e805e7216b343f9a8b03a4f5ce02aad2021-12-02T15:35:10ZOptimizing agent behavior over long time scales by transporting value10.1038/s41467-019-13073-w2041-1723https://doaj.org/article/2e805e7216b343f9a8b03a4f5ce02aad2019-11-01T00:00:00Zhttps://doi.org/10.1038/s41467-019-13073-whttps://doaj.org/toc/2041-1723People are able to mentally time travel to distant memories and reflect on the consequences of those past events. Here, the authors show how a mechanism that connects learning from delayed rewards with memory retrieval can enable AI agents to discover links between past events to help decide better courses of action in the future.Chia-Chun HungTimothy LillicrapJosh AbramsonYan WuMehdi MirzaFederico CarnevaleArun AhujaGreg WayneNature PortfolioarticleScienceQENNature Communications, Vol 10, Iss 1, Pp 1-12 (2019) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Science Q |
spellingShingle |
Science Q Chia-Chun Hung Timothy Lillicrap Josh Abramson Yan Wu Mehdi Mirza Federico Carnevale Arun Ahuja Greg Wayne Optimizing agent behavior over long time scales by transporting value |
description |
People are able to mentally time travel to distant memories and reflect on the consequences of those past events. Here, the authors show how a mechanism that connects learning from delayed rewards with memory retrieval can enable AI agents to discover links between past events to help decide better courses of action in the future. |
format |
article |
author |
Chia-Chun Hung Timothy Lillicrap Josh Abramson Yan Wu Mehdi Mirza Federico Carnevale Arun Ahuja Greg Wayne |
author_facet |
Chia-Chun Hung Timothy Lillicrap Josh Abramson Yan Wu Mehdi Mirza Federico Carnevale Arun Ahuja Greg Wayne |
author_sort |
Chia-Chun Hung |
title |
Optimizing agent behavior over long time scales by transporting value |
title_short |
Optimizing agent behavior over long time scales by transporting value |
title_full |
Optimizing agent behavior over long time scales by transporting value |
title_fullStr |
Optimizing agent behavior over long time scales by transporting value |
title_full_unstemmed |
Optimizing agent behavior over long time scales by transporting value |
title_sort |
optimizing agent behavior over long time scales by transporting value |
publisher |
Nature Portfolio |
publishDate |
2019 |
url |
https://doaj.org/article/2e805e7216b343f9a8b03a4f5ce02aad |
work_keys_str_mv |
AT chiachunhung optimizingagentbehavioroverlongtimescalesbytransportingvalue AT timothylillicrap optimizingagentbehavioroverlongtimescalesbytransportingvalue AT joshabramson optimizingagentbehavioroverlongtimescalesbytransportingvalue AT yanwu optimizingagentbehavioroverlongtimescalesbytransportingvalue AT mehdimirza optimizingagentbehavioroverlongtimescalesbytransportingvalue AT federicocarnevale optimizingagentbehavioroverlongtimescalesbytransportingvalue AT arunahuja optimizingagentbehavioroverlongtimescalesbytransportingvalue AT gregwayne optimizingagentbehavioroverlongtimescalesbytransportingvalue |
_version_ |
1718386649085247488 |