Optimizing agent behavior over long time scales by transporting value

People are able to mentally time travel to distant memories and reflect on the consequences of those past events. Here, the authors show how a mechanism that connects learning from delayed rewards with memory retrieval can enable AI agents to discover links between past events to help decide better...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Chia-Chun Hung, Timothy Lillicrap, Josh Abramson, Yan Wu, Mehdi Mirza, Federico Carnevale, Arun Ahuja, Greg Wayne
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2019
Materias:
Q
Acceso en línea:https://doaj.org/article/2e805e7216b343f9a8b03a4f5ce02aad
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:2e805e7216b343f9a8b03a4f5ce02aad
record_format dspace
spelling oai:doaj.org-article:2e805e7216b343f9a8b03a4f5ce02aad2021-12-02T15:35:10ZOptimizing agent behavior over long time scales by transporting value10.1038/s41467-019-13073-w2041-1723https://doaj.org/article/2e805e7216b343f9a8b03a4f5ce02aad2019-11-01T00:00:00Zhttps://doi.org/10.1038/s41467-019-13073-whttps://doaj.org/toc/2041-1723People are able to mentally time travel to distant memories and reflect on the consequences of those past events. Here, the authors show how a mechanism that connects learning from delayed rewards with memory retrieval can enable AI agents to discover links between past events to help decide better courses of action in the future.Chia-Chun HungTimothy LillicrapJosh AbramsonYan WuMehdi MirzaFederico CarnevaleArun AhujaGreg WayneNature PortfolioarticleScienceQENNature Communications, Vol 10, Iss 1, Pp 1-12 (2019)
institution DOAJ
collection DOAJ
language EN
topic Science
Q
spellingShingle Science
Q
Chia-Chun Hung
Timothy Lillicrap
Josh Abramson
Yan Wu
Mehdi Mirza
Federico Carnevale
Arun Ahuja
Greg Wayne
Optimizing agent behavior over long time scales by transporting value
description People are able to mentally time travel to distant memories and reflect on the consequences of those past events. Here, the authors show how a mechanism that connects learning from delayed rewards with memory retrieval can enable AI agents to discover links between past events to help decide better courses of action in the future.
format article
author Chia-Chun Hung
Timothy Lillicrap
Josh Abramson
Yan Wu
Mehdi Mirza
Federico Carnevale
Arun Ahuja
Greg Wayne
author_facet Chia-Chun Hung
Timothy Lillicrap
Josh Abramson
Yan Wu
Mehdi Mirza
Federico Carnevale
Arun Ahuja
Greg Wayne
author_sort Chia-Chun Hung
title Optimizing agent behavior over long time scales by transporting value
title_short Optimizing agent behavior over long time scales by transporting value
title_full Optimizing agent behavior over long time scales by transporting value
title_fullStr Optimizing agent behavior over long time scales by transporting value
title_full_unstemmed Optimizing agent behavior over long time scales by transporting value
title_sort optimizing agent behavior over long time scales by transporting value
publisher Nature Portfolio
publishDate 2019
url https://doaj.org/article/2e805e7216b343f9a8b03a4f5ce02aad
work_keys_str_mv AT chiachunhung optimizingagentbehavioroverlongtimescalesbytransportingvalue
AT timothylillicrap optimizingagentbehavioroverlongtimescalesbytransportingvalue
AT joshabramson optimizingagentbehavioroverlongtimescalesbytransportingvalue
AT yanwu optimizingagentbehavioroverlongtimescalesbytransportingvalue
AT mehdimirza optimizingagentbehavioroverlongtimescalesbytransportingvalue
AT federicocarnevale optimizingagentbehavioroverlongtimescalesbytransportingvalue
AT arunahuja optimizingagentbehavioroverlongtimescalesbytransportingvalue
AT gregwayne optimizingagentbehavioroverlongtimescalesbytransportingvalue
_version_ 1718386649085247488