Rewarded spike timing dependent plasticity (STDP) has been implicated as a possible learning mechanism in a variety of brain systems, including artificial neural networks. This mechanism combines unsupervised STDP that modifies synaptic strength depending on the relative timing of presynaptic input and postsynaptic spikes together with a reinforcement signal that modulates synaptic changes. Neural networks seek to duplicate the ability of biological neural networks to solve complex problems. Thus, one goal of implementing artificial neural networks is to implement the network so that it can learn and solve complex problems without input from a programmer or user.
However, neural networking is a relatively new science and validating whether current models can solve problems is difficult. Typically current models are validated only by comparison with experimental data, which usually does not guarantee that these models are capable of problem solving. This disclosure introduces a model that is capable of learning and decision-making based on the learning experience and thus is validated by its problem solving capability.
The attempts to use neural networks for problem solving has been done before in artificial neural networks. Artificial neural networks have been theorized to solve problems since Alan Turing's B-machines in the 1940s. These artificial neural networks typically consist of three layers of neurons. An input, hidden and output layer connected in an all to all feed-forward pattern between layers. Each neuron consists of a non-liner summation of input function scaled by incoming connection strengths. Ultimately the network is a function that can be used for analyzing data by presenting data to the input layer and reading the resulting outputs of the output layer. A typical use of such a network is to solve classification problems.
Further developments of these type of models have led to vastly improved capabilities. However despite mimicking biological networks in many respects major differences exist. Canonically artificial neural networks use back propagation to enable reinforcement or supervised learning. It has long been known that biological networks do not use this powerful technique and instead use some variant of hebbian plasticity. Furthermore, artificial neural networks can ignore issues of homeostasis due to lack of a temporal dimension. This stems from the neurons not being constrained to all or nothing output and synaptic communication as most biological neurons are. Greatly attenuating the problem of signal to noise. Artificial networks can avoid the distal reward problem because input and reward can be artificially correlated in time.
The distal reward problem arises because the mechanisms of reinforcement learning must be dependent on both the network activity and a reward signal. In any biological organism, the reward is often not received until several seconds after the activity which resulted in the correct response. This creates a problem of how this strategy can be implemented in computational algorithms mimicking biological system. When reward signal arrives, the relevant activity has long since subsided and the relevant neurons and connections may well have been involved on other activities during this period. Rewarded spike time dependent plasticity is proposed as a solution to this problem. It has been proposed that spike time dependent traces are created and in some way stored at a synaptic terminal whenever the pre and post synaptic neuron both experience firing events. These traces are positive when the presynaptic neuron fires first and strong when the events occur close together in time. When these traces are later reinforced by receiving a reward signal (often believed to be dopamine) they turn into long term changes in synaptic strength.