Temporal-difference reinforcement learning with distributed representations.

Temporal-difference (TD) algorithms have been proposed as models of reinforcement learning (RL). We examine two issues of distributed representation in these TD algorithms: distributed representations of belief and distributed discounting factors. Distributed representation of belief allows the beli...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Zeb Kurth-Nelson, A David Redish
Formato:	article
Lenguaje:	EN
Publicado:	Public Library of Science (PLoS) 2009
Materias:	Medicine R Science Q
Acceso en línea:	https://doaj.org/article/10b71edf81334d619f75d3ba97df1661
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:10b71edf81334d619f75d3ba97df1661
record_format	dspace
spelling	oai:doaj.org-article:10b71edf81334d619f75d3ba97df16612021-11-25T06:28:38ZTemporal-difference reinforcement learning with distributed representations.1932-620310.1371/journal.pone.0007362https://doaj.org/article/10b71edf81334d619f75d3ba97df16612009-10-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/19841749/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203Temporal-difference (TD) algorithms have been proposed as models of reinforcement learning (RL). We examine two issues of distributed representation in these TD algorithms: distributed representations of belief and distributed discounting factors. Distributed representation of belief allows the believed state of the world to distribute across sets of equivalent states. Distributed exponential discounting factors produce hyperbolic discounting in the behavior of the agent itself. We examine these issues in the context of a TD RL model in which state-belief is distributed over a set of exponentially-discounting "micro-Agents", each of which has a separate discounting factor (gamma). Each microAgent maintains an independent hypothesis about the state of the world, and a separate value-estimate of taking actions within that hypothesized state. The overall agent thus instantiates a flexible representation of an evolving world-state. As with other TD models, the value-error (delta) signal within the model matches dopamine signals recorded from animals in standard conditioning reward-paradigms. The distributed representation of belief provides an explanation for the decrease in dopamine at the conditioned stimulus seen in overtrained animals, for the differences between trace and delay conditioning, and for transient bursts of dopamine seen at movement initiation. Because each microAgent also includes its own exponential discounting factor, the overall agent shows hyperbolic discounting, consistent with behavioral experiments.Zeb Kurth-NelsonA David RedishPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 4, Iss 10, p e7362 (2009)
institution	DOAJ
collection	DOAJ
language	EN
topic	Medicine R Science Q
spellingShingle	Medicine R Science Q Zeb Kurth-Nelson A David Redish Temporal-difference reinforcement learning with distributed representations.
description	Temporal-difference (TD) algorithms have been proposed as models of reinforcement learning (RL). We examine two issues of distributed representation in these TD algorithms: distributed representations of belief and distributed discounting factors. Distributed representation of belief allows the believed state of the world to distribute across sets of equivalent states. Distributed exponential discounting factors produce hyperbolic discounting in the behavior of the agent itself. We examine these issues in the context of a TD RL model in which state-belief is distributed over a set of exponentially-discounting "micro-Agents", each of which has a separate discounting factor (gamma). Each microAgent maintains an independent hypothesis about the state of the world, and a separate value-estimate of taking actions within that hypothesized state. The overall agent thus instantiates a flexible representation of an evolving world-state. As with other TD models, the value-error (delta) signal within the model matches dopamine signals recorded from animals in standard conditioning reward-paradigms. The distributed representation of belief provides an explanation for the decrease in dopamine at the conditioned stimulus seen in overtrained animals, for the differences between trace and delay conditioning, and for transient bursts of dopamine seen at movement initiation. Because each microAgent also includes its own exponential discounting factor, the overall agent shows hyperbolic discounting, consistent with behavioral experiments.
format	article
author	Zeb Kurth-Nelson A David Redish
author_facet	Zeb Kurth-Nelson A David Redish
author_sort	Zeb Kurth-Nelson
title	Temporal-difference reinforcement learning with distributed representations.
title_short	Temporal-difference reinforcement learning with distributed representations.
title_full	Temporal-difference reinforcement learning with distributed representations.
title_fullStr	Temporal-difference reinforcement learning with distributed representations.
title_full_unstemmed	Temporal-difference reinforcement learning with distributed representations.
title_sort	temporal-difference reinforcement learning with distributed representations.
publisher	Public Library of Science (PLoS)
publishDate	2009
url	https://doaj.org/article/10b71edf81334d619f75d3ba97df1661
work_keys_str_mv	AT zebkurthnelson temporaldifferencereinforcementlearningwithdistributedrepresentations AT adavidredish temporaldifferencereinforcementlearningwithdistributedrepresentations
_version_	1718413663310708736

Temporal-difference reinforcement learning with distributed representations.

Ejemplares similares