1. Field of the Disclosure
The present disclosure relates to implementing generalized learning rules in stochastic systems.
2. Description of Related Art
Adaptive signal processing systems are well known in the arts of computerized control and information processing. One typical configuration of an adaptive system is shown in FIG. 1. The system 100 may be capable of changing or “learning” its internal parameters based on the input 102, output 104 signals, and/or an external influence 106. The system 100 may be commonly described using a function 110 that depends (including probabilistic dependence) on the history of inputs and outputs of the system and/or on some external signal r that is related to the inputs and outputs. The function F(x,y,r) may be called a “performance function”. The purpose of adaptation (or learning) may be to optimize the input-output transformation according to some criteria, where learning is described as minimization of an average value of the performance function F.
Although there are numerous models of adaptive systems, these typically implement a specific set of learning rules (e.g., supervised, unsupervised, reinforcement). Supervised learning may be the machine learning task of inferring a function from supervised (labeled) training data. Reinforcement learning may refer to an area of machine learning concerned with how an agent ought to take actions in an environment so as to maximize some notion of reward (e.g., immediate or cumulative). Unsupervised learning may refer to the problem of trying to find hidden structure in unlabeled data. Because the examples given to the learner are unlabeled, there is no external signal to evaluate a potential solution.
When the task changes, the learning rules (typically effected by adjusting the control parameters w={w1, w2, . . . , wn}) may need to be modified to suit the new task. Hereinafter, the boldface variables and symbols with arrow superscripts denote vector quantities, unless specified otherwise. Complex control applications, such as for example, autonomous robot navigation, robotic object manipulation, and/or other applications may require simultaneous implementation of a broad range of learning tasks. Such tasks may include visual recognition of surroundings, motion control, object (face) recognition, object manipulation, and/or other tasks. In order to handle these tasks simultaneously, existing implementations may rely on a partitioning approach, where individual tasks are implemented using separate controllers, each implementing its own learning rule (e.g., supervised, unsupervised, reinforcement).
One typical implementation of multi-task learning controller of prior art is illustrated in FIG. 1A. The apparatus 120 comprises several blocks 120, 124, 130, each implementing a set of learning rules tailored for the particular task (e.g., motor control, visual recognition, object classification and manipulation, respectively). Some of the blocks (e.g., the signal processing block 130 in FIG. 1A) may further comprise sub-blocks (e.g., the blocks 132, 134) targeted at different learning tasks. Implementation of the apparatus 120 may have several shortcomings stemming from each block having a task specific implementation of learning rules. By way of example, a recognition task may be implemented using supervised learning while object manipulator tasks may comprise reinforcement learning. Furthermore, a single task may require use of more than one rule (e.g., signal processing task for block 130 in FIG. 1A) thereby necessitating use of two separate sub-blocks (e.g., blocks 132, 134) each implementing different learning rule (e.g., unsupervised learning and supervised learning, respectively).
Artificial neural networks may be used to solve some of the described problems. An artificial neural network (ANN) may include a mathematical and/or computational model inspired by the structure and/or functional aspects of biological neural networks. A neural network comprises a group of artificial neurons (units) that are interconnected by synaptic connections. Typically, an ANN is an adaptive system that is configured to change its structure (e.g., the connection configuration and/or neuronal states) based on external or internal information that flows through the network during the learning phase.
A spiking neuronal network (SNN) may be a special class of ANN, where neurons communicate by sequences of spikes. SNN may offer improved performance over conventional technologies in areas which include machine vision, pattern detection and pattern recognition, signal filtering, data segmentation, data compression, data mining, system identification and control, optimization and scheduling, and/or complex mapping. Spike generation mechanism may be a discontinuous process (e.g., as illustrated by the pre-synaptic spikes sx(t) 220, 222, 224, 226, 228, and post-synaptic spike train sy(t) 230, 232, 234 in FIG. 2) and a classical derivative of function F(s(t)) with respect to spike trains sx(t), sy(t) is not defined.
Even when a neural network is used as the computational engine for these learning tasks, individual tasks may be performed by a separate network partition that implements a task-specific set of learning rules (e.g., adaptive control, classification, recognition, prediction rules, and/or other rules). Unused portions of individual partitions (e.g., motor control when the robotic device is stationary) may remain unavailable to other partitions of the network that may require increased processing resources (e.g., when the stationary robot is performing face recognition tasks). Furthermore, when the learning tasks change during system operation, such partitioning may prevent dynamic retargeting (e.g., of the motor control task to visual recognition task) of the network partitions. Such solutions may lead to expensive and/or over-designed networks, in particular when individual portions are designed using the “worst possible case scenario” approach. Similarly, partitions designed using a limited resource pool configured to handle an average task load may be unable to handle infrequently occurring high computational loads that are beyond a performance capability of the particular partition, even when other portions of the networks have spare capacity.
By way of illustration, consider a mobile robot controlled by a neural network, where the task of the robot is to move in an unknown environment and collect certain resources by the way of trial and error. This can be formulated as reinforcement learning tasks, where the network is supposed to maximize the reward signals (e.g., amount of the collected resource). While in general the environment is unknown, there may be possible situations when the human operator can show to the network desired control signal (e.g., for avoiding obstacles) during the ongoing reinforcement learning. This may be formulated as a supervised learning task. Some existing learning rules for the supervised learning may rely on the gradient of the performance function. The gradient for reinforcement learning part may be implemented through the use of the adaptive critic; the gradient for supervised learning may be implemented by taking a difference between the supervisor signal and the actual output of the controller. Introduction of the critic may be unnecessary for solving reinforcement learning tasks, because direct gradient-based reinforcement learning may be used instead. Analytic derivation of the learning rules may further be required when the loss function between supervised and actual output signal is redefined.
While different types of learning may be formalized as a minimization of the performance function F, often, optimal minimization solution cannot be found analytically, particularly when relationships between system's behavior and the performance function are complex. By way of example, nonlinear regression applications generally may not have analytical solutions. Likewise, in motor control applications, it may not be feasible to analytically determine the reward arising from external environment of the robot, as the reward typically may be dependent on the current motor control command and state of the environment. Moreover, analytic determination of a performance function F derivative may require additional operations (often performed manually) for individual new formulated tasks that are not suitable for dynamic switching and reconfiguration of the tasks described before.
Some of the existing approaches of taking a derivative of a performance function without analytic calculations may include a “brute force” finite difference estimator of the gradient. However these estimators may be impractical for use with large spiking networks comprising many (typically in excess of hundreds) parameters.
Derivative-free methods, specifically Score Function (SF), also known as Likelihood Ratio (LR) method, exist. In order to determine a direction of the steepest descent, these methods may sample the value of F(x,y) in different points of parameter space according to some probability distribution. Instead of calculating the derivative of the performance function F(x,y), the SR and LR methods utilize a derivative of the sampling probability distribution. This process can be considered as an exploration of the parameter space.
Although some adaptive controller implementations may describe reward-modulated unsupervised learning algorithms, these implementations of unsupervised learning algorithms may be multiplicatively modulated by reinforcement learning signal and, therefore, may require the presence of reinforcement signal for proper operation.
Many presently available implementations of stochastic adaptive apparatuses may be incapable of learning to perform unsupervised tasks while being influenced by additive reinforcement (and vice versa). Many presently available adaptive implementations may be task-specific and implement one particular learning rule (e.g., classifier unsupervised learning), and such devices invariably require retargeting (e.g., reprogrammed) in order to implement different learning rules. Furthermore, presently available methodologies may not be capable of implementing generalized learning, where a combination of different learning rules (e.g., reinforcement, supervised and supervised) are used simultaneously for the same application (e.g., platform motion stabilization), in order to obtain, for example, faster learning convergence, better response to sudden changes, and/or improved overall stability, particularly in the presence or noise.
Accordingly, there is a salient need for machine learning apparatus and methods to implement generalized stochastic learning configured to handle simultaneously any learning rule combination (e.g., reinforcement, supervised, unsupervised, online, batch) and is capable of, inter alia, dynamic reconfiguration using the same set of network resources.