Reward-modulated spike-timing-dependent plasticity (STDP) is considered as a strong candidate for a learning rule to explain behaviorally relevant weight changes in networks of spiking neurons. This scheme requires an exponentially decaying eligibility trace with a time constant of a plurality of seconds for every synapse. Since neural networks of interest often have millions of synapses, implementing such an exponentially decaying eligibility trace for every synapse can be very expensive in terms of silicon area.
A direct implementation of the eligibility trace in hardware can place analog or digital circuits in each synapse to create an exponential time constant. Such a solution may require hundreds of square microns per synapse. Therefore, an area-efficient implementation of the reward-modulated STDP might be needed.