Reinforcement learning using a continuous time actor-critic framework with spiking neurons.

Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of re...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Nicolas Frémaux, Henning Sprekeler, Wulfram Gerstner
Formato:	article
Lenguaje:	EN
Publicado:	Public Library of Science (PLoS) 2013
Materias:	Biology (General) QH301-705.5
Acceso en línea:	https://doaj.org/article/460ef17dce144fe7852575a719650b18
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:460ef17dce144fe7852575a719650b18
record_format	dspace
spelling	oai:doaj.org-article:460ef17dce144fe7852575a719650b182021-11-18T05:52:14ZReinforcement learning using a continuous time actor-critic framework with spiking neurons.1553-734X1553-735810.1371/journal.pcbi.1003024https://doaj.org/article/460ef17dce144fe7852575a719650b182013-04-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/23592970/?tool=EBIhttps://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.Nicolas FrémauxHenning SprekelerWulfram GerstnerPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 9, Iss 4, p e1003024 (2013)
institution	DOAJ
collection	DOAJ
language	EN
topic	Biology (General) QH301-705.5
spellingShingle	Biology (General) QH301-705.5 Nicolas Frémaux Henning Sprekeler Wulfram Gerstner Reinforcement learning using a continuous time actor-critic framework with spiking neurons.
description	Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.
format	article
author	Nicolas Frémaux Henning Sprekeler Wulfram Gerstner
author_facet	Nicolas Frémaux Henning Sprekeler Wulfram Gerstner
author_sort	Nicolas Frémaux
title	Reinforcement learning using a continuous time actor-critic framework with spiking neurons.
title_short	Reinforcement learning using a continuous time actor-critic framework with spiking neurons.
title_full	Reinforcement learning using a continuous time actor-critic framework with spiking neurons.
title_fullStr	Reinforcement learning using a continuous time actor-critic framework with spiking neurons.
title_full_unstemmed	Reinforcement learning using a continuous time actor-critic framework with spiking neurons.
title_sort	reinforcement learning using a continuous time actor-critic framework with spiking neurons.
publisher	Public Library of Science (PLoS)
publishDate	2013
url	https://doaj.org/article/460ef17dce144fe7852575a719650b18
work_keys_str_mv	AT nicolasfremaux reinforcementlearningusingacontinuoustimeactorcriticframeworkwithspikingneurons AT henningsprekeler reinforcementlearningusingacontinuoustimeactorcriticframeworkwithspikingneurons AT wulframgerstner reinforcementlearningusingacontinuoustimeactorcriticframeworkwithspikingneurons
_version_	1718424722541117440

Reinforcement learning using a continuous time actor-critic framework with spiking neurons.

Ejemplares similares