1. Field of the Disclosure
The present disclosure relates to implementing generalized learning rules in stochastic spiking neuron systems.
2. Description of Related Art
Adaptive signal processing systems are well known in the arts of computerized control and information processing. One typical configuration of an adaptive system of prior art is shown in FIG. 1. The system 100 may be capable of changing or “learning” its internal parameters based on the input 102, output 104 signals, and/or an external influence 106. The system 100 may be commonly described using a function 110 that depends (including probabilistic dependence) on the history of inputs and outputs of the system and/or on some external signal r that is related to the inputs and outputs. The function F(x,y,r) may be referred to as a “performance function”. The purpose of adaptation (or learning) may be to optimize the input-output transformation according to some criteria, where learning is described as minimization of an average value of the performance function F.
Although there are numerous models of adaptive systems, these typically implement a specific set of learning rules (e.g., supervised, unsupervised, reinforcement). Supervised learning may be the machine learning task of inferring a function from supervised (labeled) training data. Reinforcement learning may refer to an area of machine learning concerned with how an agent ought to take actions in an environment so as to maximize some notion of reward (e.g., immediate or cumulative). Unsupervised learning may refer to the problem of trying to find hidden structure in unlabeled data. Because the examples given to the learner are unlabeled, there is no external signal to evaluate a potential solution.
When the task changes, the learning rules (typically effected by adjusting the control parameters w={wi, w2, . . . , wn}) may need to be modified to suit the new task. Hereinafter, the boldface variables and symbols with arrow superscripts denote vector quantities, unless specified otherwise. Complex control applications, such as for example, autonomous robot navigation, robotic object manipulation, and/or other applications may require simultaneous implementation of a broad range of learning tasks. Such tasks may include visual recognition of surroundings, motion control, object (face) recognition, object manipulation, and/or other tasks. In order to handle these tasks simultaneously, existing implementations may rely on a partitioning approach, where individual tasks are implemented using separate controllers, each implementing its own learning rule (e.g., supervised, unsupervised, reinforcement).
One conventional implementation of a multi-task learning controller is illustrated in FIG. 1A. The apparatus 120 comprises several blocks 120, 124, 130, each implementing a set of learning rules tailored for the particular task (e.g., motor control, visual recognition, object classification and manipulation, respectively). Some of the blocks (e.g., the signal processing block 130 in FIG. 1A) may further comprise sub-blocks (e.g., the blocks 132, 134) targeted at different learning tasks. Implementation of the apparatus 120 may have several shortcomings stemming from each block having a task specific implementation of learning rules. By way of example, a recognition task may be implemented using supervised learning while object manipulator tasks may comprise reinforcement learning. Furthermore, a single task may require use of more than one rule (e.g., signal processing task for block 130 in FIG. 1A) thereby necessitating use of two separate sub-blocks (e.g., blocks 132, 134) each implementing different learning rule (e.g., unsupervised learning and supervised learning, respectively).
Artificial neural networks may be used to solve some of the described problems. An artificial neural network (ANN) may include a mathematical and/or computational model inspired by the structure and/or functional aspects of biological neural networks. A neural network comprises a group of artificial neurons (units) that are interconnected by synaptic connections. Typically, an ANN is an adaptive system that is configured to change its structure (e.g., the connection configuration and/or neuronal states) based on external or internal information that flows through the network during the learning phase.
A spiking neuronal network (SNN) may be a special class of ANN, where neurons communicate by sequences of spikes. SNN may offer improved performance over conventional technologies in areas which include machine vision, pattern detection and pattern recognition, signal filtering, data segmentation, data compression, data mining, system identification and control, optimization and scheduling, and/or complex mapping. Spike generation mechanism may be a discontinuous process (e.g., as illustrated by the input spikes sx(t) 220, 222, 224, 226, 228, and output spikes sy(t) 230, 232, 234 in FIG. 2) and a classical derivative of function F(s(t)) with respect to spike trains sx(t), sy(t) is not defined.
Even when a neural network is used as the computational engine for these learning tasks, individual tasks may be performed by a separate network partition that implements a task-specific set of learning rules (e.g., adaptive control, classification, recognition, prediction rules, and/or other rules). Unused portions of individual partitions (e.g., motor control when the robotic device is stationary) may remain unavailable to other partitions of the network that may require increased processing resources (e.g., when the stationary robot is performing face recognition tasks). Furthermore, when the learning tasks change during system operation, such partitioning may prevent dynamic retargeting (e.g., of the motor control task to visual recognition task) of the network partitions. Such solutions may lead to expensive and/or over-designed networks, in particular when individual portions are designed using the “worst possible case scenario” approach. Similarly, partitions designed using a limited resource pool configured to handle an average task load may be unable to handle infrequently occurring high computational loads that are beyond a performance capability of the particular partition, even when other portions of the networks have spare capacity.
By way of illustration, consider a mobile robot controlled by a neural network, where the task of the robot is to move in an unknown environment and collect certain resources by the way of trial and error. This can be formulated as reinforcement learning tasks, where the network is supposed to maximize the reward signals (e.g., amount of the collected resource). While in general the environment is unknown, there may be possible situations when the human operator can show to the network desired control signal (e.g., for avoiding obstacles) during the ongoing reinforcement learning. This may be formulated as a supervised learning task. Some existing learning rules for the supervised learning may rely on the gradient of the performance function. The gradient for reinforcement learning part may be implemented through the use of the adaptive critic; the gradient for supervised learning may be implemented by taking a difference between the supervisor signal and the actual output of the controller. Introduction of the critic may be unnecessary for solving reinforcement learning tasks, because direct gradient-based reinforcement learning may be used instead. Additional analytic derivation of the learning rules may be needed when the loss function between supervised and actual output signal is redefined.
While different types of learning may be formalized as a minimization of the performance function F, an optimal minimization solution often cannot be found analytically, particularly when relationships between the system's behavior and the performance function are complex. By way of example, nonlinear regression applications generally may not have analytical solutions. Likewise, in motor control applications, it may not be feasible to analytically determine the reward arising from external environment of the robot, as the reward typically may be dependent on the current motor control command and state of the environment.
Moreover, analytic determination of a performance function F derivative may require additional operations (often performed manually) for individual new formulated tasks that are not suitable for dynamic switching and reconfiguration of the tasks described before.
Some of the existing approaches of taking a derivative of a performance function without analytic calculations may include a “brute force” finite difference estimator of the gradient. However, these estimators may be impractical for use with large spiking networks comprising many (typically in excess of hundreds) parameters.
Derivative-free methods, specifically Score Function (SF), also known as Likelihood Ratio (LR) method, exist. In order to determine a direction of the steepest descent, these methods may sample the value of F(x,y) in different points of parameter space according to some probability distribution. Instead of calculating the derivative of the performance function F(x,y), the SR and LR methods utilize a derivative of the sampling probability distribution. This process can be considered as an exploration of the parameter space.
Although some adaptive controller implementations may describe reward-modulated unsupervised learning algorithms, these implementations of unsupervised learning algorithms may be multiplicatively modulated by reinforcement learning signal and, therefore, may require the presence of reinforcement signal for proper operation.
Many presently available implementations of stochastic adaptive apparatuses may be incapable of learning to perform unsupervised tasks while being influenced by additive reinforcement (and vice versa). Many presently available adaptive implementations may be task-specific and implement one particular learning rule (e.g., classifier unsupervised learning), and such devices invariably require retargeting (e.g., reprogramming) in order to implement different learning rules. Furthermore, presently available methodologies may not be capable of implementing generalized learning, where a combination of different learning rules (e.g., reinforcement, supervised and supervised) are used simultaneously for the same application (e.g., platform motion stabilization), thereby enabling, for example, faster learning convergence, better response to sudden changes, and/or improved overall stability, particularly in the presence of noise.
Stochastic Spiking Neuron Models
Where certain elements of these implementations can be partially or fully implemented using known components, only those portions of such known components that are necessary for an understanding of the present disclosure will be described, and detailed descriptions of other portions of such known components will be omitted so as not to obscure the disclosure.
Learning rules used with spiking neuron networks may be typically expressed in terms of original spike trains instead of their secondary features (e.g., the rate or the latency from the last spike). The result is that a spiking neuron operates on spike train space, transforming a vector of spike trains (input spike trains) into single element of that space (output train). Dealing with spike trains directly may be a challenging task. Not every spike train can be transformed to another spike train in a continuous manner. One common approach is to describe the task in terms of optimization of some function and then use gradient approaches in the parameter space of the spiking neuron. However gradient methods on discontinuous spaces such as spike trains space are not well developed. One approach may involve smoothing the spike trains first. Here output spike trains are smoothed with introduction of probabilistic measure on a spike trains space. Describing the spike pattern from a probabilistic point of view may lead to fruitful connections with the huge amount of topics within information theory, machine learning, Bayesian inference, statistical data analysis etc. This approach makes spiking neurons a good candidate to use SF/LR learning methods.
One technique frequently used when constructing learning rules in a spiking network, comprises application of a random exploration process to a spike generation mechanism of a spiking neuron. This is often implemented by introducing a noisy threshold: probability of a spike generation may depend on the difference between neuron's membrane voltage and a threshold value. The usage of probabilistic spiking neuron models, in order to obtain gradient of the log-likelihood of a spike train with respect to neuron's weights, may comprise an extension of Hebbian learning framework to spiking neurons. The use of the log-likelihood gradient of a spike train may be extended to supervised learning. In some approaches, information theory framework may be applied to spiking neurons, as for example, when deriving optimal learning rules for unsupervised learning tasks via informational entropy minimization.
An application of the OLPOMDM algorithm to the solution of the reinforcement learning problems with simplified spiking neurons has been done. Extending of this algorithm to more plausible neuron model has been done. However no generalizations of the OLPOMDM algorithm have been done in order to use it unsupervised and supervised learning in spiking neurons. An application of reinforcement learning ideas to supervised learning has been described, however only heuristic algorithms without convergence guarantees have been used.
For a neuron, the probability of an output spike train, y, to have spikes at times t_f with no spikes at the other times on a time interval [0, T], given the input spikes, x, may be given by the conditional probability density function p(y|x) as:
                    p        (                              y            ⁢                                        x              )                                =                                    Π                              t                f                                      ⁢                          λ              ⁡                              (                                  t                  f                                )                                      ⁢                          ⅇ                                                -                                      f                    0                    T                                                  ⁢                                  λ                  ⁡                                      (                    τ                    )                                                  ⁢                                  ⅆ                  τ                                                                                        (                  Eqn          .                                          ⁢          1                )            where λ(t) represents an instantaneous probability density (“hazard”) of firing.
The instantaneous probability density of the neuron can depend on neuron's state q(t): λ(t)≡λ(q(t)). For example, it can be defined according to its membrane voltage u(t) for continuous time chosen as an exponential stochastic threshold:λ(t)=λoeκ(u(t)-θ)  (Eqn. 2)where u(t) is the membrane voltage of the neuron, θ is the voltage threshold for generating a spike, x is the probabilistic parameter, and λ0 is the basic (spontaneous) firing rate of the neuron.
Some approaches utilize sigmoidal stochastic threshold, expressed as:
                              λ          ⁡                      (            t            )                          =                              λ            0                                1            -                          ⅇ                              -                                  κ                  ⁡                                      (                                                                  u                        ⁡                                                  (                          t                          )                                                                    -                      θ                                        )                                                                                                          (                  Eqn          .                                          ⁢          3                )            or an exponential-linear stochastic threshold:λ(t)=λ0 ln(1+eκ(u(t)-θ))  (Eqn. 4)where λ0, κ, θ are parameters with a similar meaning to the parameters in the exponential threshold model Eqn. 2.
Models of the stochastic threshold exist comprising refractory mechanisms that modulate the instantaneous probability of firing after the last output spike λ(t)={circumflex over (λ)}(t)R(t,tlastout), where {circumflex over (λ)}(t) is the original stochastic threshold function (such as exponential or other), and R(tlastout−t) is the dynamic refractory coefficient that depends on the time since the last output spike tlastout.
For discrete time steps, an approximation for the probability Λ(u(t))ε(0,1] of firing in the current time step may be given by:Λ(u(t))=1−e−λ(u(t))Δt  (Eqn. 5)where Δt is time step length.
In one dimensional deterministic spiking models, such as Integrate-and-Fire (IF), Quadratic Integrate-and-Fire (QIF) and others, membrane voltage u(t) is the only one state variable (q(t)≡u(t)) that is “responsible” for spike generation through deterministic threshold mechanism. There also exist plenty of more complex multidimensional spiking models. For example, a simple spiking model may comprise two state variables where only one of them is compared with a threshold value. However, even detailed neuron models may be parameterized using a single variable (e.g., an equivalent of “membrane voltage” of biological neuron) and use it with a suitable threshold in order to determine the presence of spike. Such models are often extended to describe stochastic neurons by replacing deterministic threshold with a stochastic threshold.
Generalized dynamics equations for spiking neurons models are often expressed as a superposition of input, interaction between the input current and the neuronal state variables, and neuron reset after the spike as follows:
                                          ⅆ                          q              →                                            ⅆ            t                          =                              V            ⁡                          (                              q                →                            )                                +                                                    ∑                                  t                  out                                                                                                  ⁢                                          R                ⁡                                  (                                      q                    →                                    )                                            ⁢                              δ                ⁡                                  (                                      t                    -                                          t                      out                                                        )                                                              +                                    G              ⁡                              (                                  q                  →                                )                                      ⁢                          I              ext                                                          (                  Eqn          .                                          ⁢          6                )            where:  is a vector of internal state variables (e.g., comprising membrane voltage); Iext is external input to the neuron; F—is the function that defines evolution of the state variables; G describes the interaction between the input current and the state variables (for example, to model synaptic depletion); and R describes resetting the state variables after the output spikes at tout.
For example, for IF model the state vector and the state model may be expressed as:{right arrow over (q)}≡u(t);V({right arrow over (q)})=−Cu;R({right arrow over (q)})=ures−u;G({right arrow over (q)})=1,  (Eqn. 7)where C is a membrane constant, and ures is the value to which voltage is set after output spike (reset value). Accordingly, Eqn. 6 becomes:
                                          ⅆ            u                                ⅆ            t                          =                              -            Cu                    +                                    ∑                              t                out                                      ⁢                                          (                                                      u                    refr                                    -                  u                                )                            ⁢                              δ                ⁡                                  (                                      t                    -                                          t                      out                                                        )                                                              +                      I            ext                                              (                  Eqn          .                                          ⁢          8                )            
For some simple neuron models, Eqn. 6 may be expressed as:
                                                        ⅆ              v                                      ⅆ              t                                =                                    0.04              ⁢                              v                2                                      +                          5              ⁢              v                        +            140            -            u            +                                          ∑                                  t                  out                                            ⁢                                                (                                      c                    -                    v                                    )                                ⁢                                  δ                  ⁡                                      (                                          t                      -                                              t                        out                                                              )                                                                        +                          I              ext                                      ⁢                                  ⁢                                  ⁢                                            ⅆ              u                                      ⅆ              t                                =                                    a              ⁡                              (                                  bv                  -                  u                                )                                      +                          d              ⁢                                                ∑                                      t                    out                                                  ⁢                                  δ                  ⁡                                      (                                          t                      -                                              t                        out                                                              )                                                                                      ⁢                                  ⁢                                  ⁢        where                            (                  Eqn          .                                          ⁢          9                )                                                          ⁢                                                            q                →                            =                              (                                                                                                    v                        ⁡                                                  (                          t                          )                                                                                                                                                                        u                        ⁡                                                  (                          t                          )                                                                                                                    )                                      ;                                          V                ⁡                                  (                                      q                    →                                    )                                            =                              (                                                                                                                              0.04                          ⁢                                                                                    v                              2                                                        ⁡                                                          (                              t                              )                                                                                                      -                                                  5                          ⁢                                                      v                            ⁡                                                          (                              t                              )                                                                                                      +                        140                        -                                                  u                          ⁡                                                      (                            t                            )                                                                                                                                                                                                  a                        ⁡                                                  (                                                                                    bv                              ⁡                                                              (                                t                                )                                                                                      -                                                          u                              ⁡                                                              (                                t                                )                                                                                                              )                                                                                                                    )                                      ;                    ⁢                                          ⁢                                          ⁢                                                                      R                  ⁡                                      (                                          q                      →                                        )                                                  =                                  (                                                                                                              c                          -                                                      v                            ⁡                                                          (                              t                              )                                                                                                                                                                                          d                                                                              )                                            ;                                                G                  ⁡                                      (                                          q                      →                                        )                                                  =                                  (                                                                                    1                                                                                                            0                                                                              )                                                      ,                                              (                  Eqn          .                                          ⁢          10                )            and a, b, c, d are parameters of the model.
Many presently available implementations of stochastic adaptive apparatuses may be incapable of learning to perform unsupervised tasks while being influenced by additive reinforcement (and vice versa). Many presently available adaptive implementations may be task-specific and implement one particular learning rule (e.g., classifier unsupervised learning), and such devices invariably require retargeting (e.g., reprogrammed) in order to implement different learning rules.
Accordingly, there is a salient need for machine learning apparatus and methods to implement generalized stochastic learning in spiking networks that is configured to handle simultaneously any learning rule combination (e.g., reinforcement, supervised, unsupervised, online, batch) and is capable of, inter alia, dynamic reconfiguration using the same set of network resources.