1. Field of the Disclosure
The present innovation relates to machine learning apparatus and methods and more particularly, in some implementations, to computer apparatus and methods for implementing reinforcement learning rules in artificial neural networks.
2. Artificial Neural Networks
An artificial neural network (ANN) is a mathematical or computational model (which may be embodied for example in computer logic or other apparatus) that is inspired by the structure and/or functional aspects of biological neural networks. A neural network comprises a group of artificial neurons that are interconnected by synaptic connections. Typically, an ANN is an adaptive system that is configured to change its structure (e.g., the connection configuration and/or neural states) based on external or internal information that flows through the network during the learning phase.
Artificial neural networks are used to model complex relationships between inputs and outputs or to find patterns in data, where the dependency between the inputs and the outputs cannot be easily attained (Hertz J., Krogh A., and Palmer R. (1991) Introduction to the Theory of Neural Networks, Addison-Wesley, incorporated herein by reference in its entirety).
Artificial neural networks offer improved performance over conventional technologies in areas which include machine vision, pattern detection and pattern recognition, signal filtering, data segmentation, data compression, data mining, system identification and control, optimization and scheduling, complex mapping. For more details on applications of ANN see e.g. to Haykin, S., (1999), Neural Networks: A Comprehensive Foundation (Second Edition), Prentice-Hall or Fausett, L. V., (1994), Fundamentals of Neural Networks: Architectures, Algorithms and Applications, Prentice-Hall, each incorporated herein by reference in its entirety.
Neuron Models
An artificial neuron is a computational model inspired by natural, biological neurons. Biological neurons receive signals through specialized inputs called synapses. When the signals received are strong enough (surpass a certain threshold), the neuron is activated and emits a signal through its output. This signal might be sent to another synapse, and might activate other neurons. Signals transmitted between biological neurons are encoded in sequences of stereotypical short electrical impulses, called action potentials, pulses, or spikes.
Analog Neuron Models
The complexity of real neurons is highly abstracted when modeling artificial neurons. A schematic diagram of an artificial analog neuron is illustrated in FIG. 1A. The analog neuron 102 receives analog inputs via input connections 110 and produces an output signal y delivered via connection 108. The input connections are characterized by weights 104.
The model comprises a vector of inputs 106 x=[x1, x2 . . . , xn]T, a vector of weights 104 w=[w1, . . . , wn] (weights define the strength of the respective signals), and a mathematical function which determines the activation of the neuron's output 108. The activation function may have various forms. In the simplest neuron models, the activation function is a linear function and the neuron output is calculated as:y=wx  (Eqn. 1)
More details on artificial neural networks can be found e.g. in Hertz et al. (1991), discussed supra.
Spiking Neuron Models
Models of artificial neurons typically perform signal transmission by using the rate of the action potentials for encoding information. Hence, signals transmitted in these ANN models typically have analog (floating-point) representation.
To the contrary, spiking neurons or spiking neural networks (SNN) represent a special class of ANN, where neuron models communicate by sequences of spikes (see Gerstner W. and Kistler W. (2002) Spiking Neuron Models. Single Neurons, Populations, Plasticity, Cambridge University Press, incorporated herein by reference in its entirety).
Most common spiking neuron models use the timing of spikes, rather than the specific shape of spikes, in order to encode neural information. A spike “train” can be described as follows:S(t)=Σfδ(t−tf),  (Eqn. 2)where f=1, 2, . . . is the spike designator and δ(.) is the Dirac function with δ(t)=0 for t≠0 and∫−∞∞δ(t)dt=1  (Eqn. 3)
Various spiking neuron models exist, such as, for example: Integrate-and-Fire (IF) and Leaky-Integrate-and-Fire (LIF), (see e.g., Stein, R. B., (1967). Some models of neural variability. Biophys. 1, 7: 37-68, incorporated herein by reference in its entirety). The dynamics of an exemplary LIF unit is described as follows:
                              C          ⁢                                    ⅆ                              u                ⁡                                  (                  t                  )                                                                    ⅆ              t                                      =                                            -                              1                R                                      ⁢                          u              ⁡                              (                t                )                                              +                      [                                                            i                  o                                ⁡                                  (                  t                  )                                            -                              ∑                                                      w                    j                                    ⁢                                                            i                      j                                        ⁡                                          (                      t                      )                                                                                            ]                                              (                  Eqn          .                                          ⁢          4                )            where:                u(t) is the model state variable (corresponding to the neural membrane potential of a biological neuron);        C is the membrane capacitance;        R is the input resistance;        io(t) is the external current driving the neural state;        ij(t) is the input current from the j-th synaptic input; and        wj represents the strength of the j-th synapse.        
When the input resistance R→∞, Eqn. 3 describes the IF model. FIG. 1A illustrates one example of a typical neuron response to stimulation. In both IF and LIF models, a neuron is configured to fire a spike at time tf, whenever the membrane potential u(t) (denoted by the traces 114, 128 in FIG. 1A) reaches a certain value υ, referred to as the firing threshold, denoted by the line 118 in FIG. 1A. Immediately after generating an output spike, the neuron state is reset to a new value ures<υ, and the state is held at that level for a time interval representing the neural absolute refractory period. As illustrated in FIG. 1A, the extended stimulation of the node by the input signal 113 triggers multiple high excitability u(t) events within the node (as shown by the pulsing events 115 in FIG. 1A) that exceed the firing threshold 118. These events 115 result in the generation of the pulse train 116 by the node.
Most neural models may be characterized by a sub-threshold and super-threshold states. While sub-threshold stimulus typically only modifies the internal state of a neuron (e.g. increases the membrane potential), the super-threshold stimulus results in (i) a change of the internal state; and (ii) well as in the post-synaptic response by the neuron. That is, the super-threshold stimuli cause a neuron to generate output signals (action potentials, spikes) that can further be propagated to other neurons.
Biological neurons communicate with one another through specialized junctions called synapses (see Sherrington, C. S., (1897); The Central Nervous System. In: A Textbook of Physiology, 7th ed., part III, Ed. by Foster M. Macmillian and Co. Ltd., London, p. 929; Sutton R.S. (1988). Learning to predict by the methods of temporal differences. Machine Learning 3(1), 9-44; and Bennett, M. R., (1999), The early history of the synapse: from Plato to Sherrington. Brain Res. Bull., 50(2): 95-118; each of the foregoing incorporated herein by reference in its entirety). Arrival of a pre-synaptic spike (illustrated by the spike train 120 in FIG. 1A) at a synapse provides an input signal i(t) into the post-synaptic neuron. This input signal corresponds to the synaptic electric current flowing into the biological neuron, and may be modeled as using an exponential function as follows:
                              i          ⁡                      (            t            )                          =                              ∫            0            ∞                    ⁢                                                    S                j                            ⁡                              (                                  s                  -                  t                                )                                      ⁢                          ⅇ                              -                                  s                                      τ                    s                                                                        ⁢                          ⅆ              s                                                          (                  Eqn          .                                          ⁢          5                )            where τs is the synaptic time constant, and S(t) denotes here a pre-synaptic spike train. A typical response of the synapse model given by Eqn. 5 to a sample input spike train 120 is illustrated by the curve labeled 123 in FIG. 1A. The neuron potential u(t) in response to the spike train 120 is depicted by the line 128 in FIG. 1A.
Similarly to the analog input, the spiking input 120 into a node triggers a synaptic input current, which in an exemplary implementation has a shape of a trace 123. The trace 128 depicts internal state of the node responsive to the synaptic input current 123. As shown in FIG. 1A, a single input pulse 122 of the pulse train 120 does not raise the node state above the firing threshold 118 and, hence, does not cause output spike generation. Pulse groups 124, 126 of the pulse train 120 cause the node state (excitability) to reach the firing threshold and result in the generation of output pulses 132, 134, respectively. A review of exemplary spiking neuron models is provided by Gerstner and Kistler 2002, incorporated by reference supra.
Spiking neural networks offer several benefits over other classes of ANN, including without limitation: greater information and memory capacity, richer repertoire of behaviors (including tonic/phasic spiking, bursting, spike latency, spike frequency adaptation, resonance, threshold variability, input accommodation and hi-stability), as well as efficient hardware implementations. In many models of ANN, it is assumed that weights comprise parameters that can be adapted. This process of adjusting the weights is commonly referred to as adaptation, “learning” or “training”.
Reinforcement Learning Methods
In machine learning, reinforcement learning refers to the problem the goal of learning is explored via interactions between a learning agent and the environment. At each point in time t, the agent performs an action y_t and the environment generates an observation x_t and an instantaneous cost c_t, according to some (usually unknown) dynamics. The aim of the reinforcement learning is to discover a policy for selecting actions that minimizes some measure of a long-term cost; i.e., the expected cumulative cost. The environment's dynamics and the long-term cost for each policy are usually unknown, but can be estimated.
Most algorithms for reinforcement or reward-based learning in spiking neural networks are typically represented using the following general equation described, for example, by Fremaux, N. et al. (2010), Functional requirements for Reward-Modulated Spike-Timing-Dependent Plasticity, J. of Neuroscience, 30(4):13326-13337; Izhikevich, E. (2007), Solving the Distal Reward Problem through Linkage of STDP and Dopamine Signaling, Cerebral Cortex, 17, 2443-2452; and Legenstein, R., et al. (2008), A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback. PLoS Computational Biology, 4(10):1-27), each incorporated by reference here in its entirety:
                                          ⅆ                                          θ                ij                            ⁡                              (                t                )                                                          ⅆ            t                          =                  η          ⁢                                          ⁢                      R            ⁡                          (              t              )                                ⁢                                    e              ij                        ⁡                          (              t              )                                                          (                  Eqn          .                                          ⁢          6                )            where:                θji(t) is an adapted parameter of a synaptic connection between the pre-synaptic neuron i and the post-synaptic neuron j;        η is a parameter referred to as the learning rate that scales the θ-changes enforced by learning, η can be a constant parameter or it can be a function of some other system parameters;        R(t) is a function describing the reward signal;        eji(t) is eligibility trace, configured to characterize correlation between pre-synaptic and post-synaptic activity.        
Most existing learning algorithms based on Eqn. 6 may modify adaptive parameters θ only when the reward signal R(t) is nonzero, and both the pre-synaptic and the post-synaptic neurons are active. Accordingly, when either of these neurons is ‘silent’ (i.e., is not generating spikes), existing methods may provide no adaptations of the associated synaptic connections, according to existing art.
Based on the foregoing, there is a salient need for apparatus and method capable of efficient implementation of exploration (e.g., activation of silent neurons) during a learning process aimed at exploration of new possible solutions for the learning problem. This is a pertinent problem and unsatisfied need not only in the context of reinforcement learning, but also for supervised and unsupervised learning.