Temporal difference learning does not always lead to STDP
Răzvan V. Florian, Cătălin V. Rusu
It has been previously shown that a form of temporal difference (TD) learning for predicting the value of the membrane potential of a neuron, at a fixed delay after the neuron received a presynaptic spike, results in a learning rule that is very similar to hebbian STDP (Rao and Sejnowski, 2001). Since this result was obtained using a relatively complex neural model (two-compartmental, with Hodgkin-Huxley-like currents) and a simple setup (a single presynaptic spike followed by a single current pulse), we investigated whether it holds for simpler neural models and more general situations. This is relevant both for the theoretical understanding of this phenomenon and for verifying that it holds in common simulations of spiking neural networks.
We studied the same phenomenon using both integrate-and-fire (IAF) and Izhikevich neurons, through simulations and, for the IAF neuron, also analytically. Postsynaptic spikes were generated by single current pulses (as in the original study), by the irregular firing of synaptic afferents, or by a constant input current.
For the IAF neuron, we found a hebbian, STDP-like plasticity rule only when postsynaptic spikes were generated by a single current pulse and the reset potential of the neuron was positive. For the same input and negative reset potential, as well as for constant input current (regardless of the reset potential), the resulting plasticity rule was anti-hebbian.
For the Izhikevich neuron we obtained hebbian STDP-like plasticity for both constant and pulsed input current. There is a qualitative difference with respect to the IAF case because the Izhikevich neuron incorporates the dynamics of the membrane potential during the onset of the action potential. By adding an action potential of non-zero duration to the IAF model, the shape of the plasticity function changes significantly and becomes similar to hebbian STDP. This shows that the plasticity function resulted from TD learning depends critically on whether the neuron adapts its synapses to learn the shape of its action potential or not. However, the shape of the action potential is commonly considered not to carry information. When we consider just the TD learning of the sub-threshold dynamics of the membrane potential, the shape of the resulted learning function can loose its similarity with hebbian STDP.
For both neural models, in the case of irregular synaptic input there was no clear relationship between the plastic changes predicted by TD learning and the temporal delay between the pre- and postsynaptic spikes. Moreover, the sign of these plastic changes did not depend uniquely on the sign of the temporal delay.
In conclusion, TD learning in spiking neurons does not always lead to hebbian STDP.