1. Field of the Invention
The present invention generally relates to neural networks, and more particularly, to recurrent neural networks for estimation (including prediction) and/or control.
2. Background Description
This invention is concerned with the problems of optimal, or approximately optimal, estimation and control. In a standard formulation of these problems, there is an external system or “plant” described at each of a plurality of times by a plant state that evolves over time according to a stochastic plant process, and there is a stochastic measurement process that produces measurements at each of a plurality of times. The optimal estimation problem consists of using the measurements to generate estimates of the plant states over time, so as to minimize a specified measure of the error between the estimated and actual plant states. The term “estimation” can refer to prediction, filtering, and/or smoothing, as described below.
The optimal control problem consists of using the measurements, or estimates thereof, to generate control signals that have the effect of altering the plant state in such a manner as to minimize a specified measure of the error between the actual and a specified desired (or target) plant state at one or more future times. A function, called the “cost-to-go”, is typically specified. This function describes the cost of a process that generates and applies a succession of control signals to the plant, resulting in a succession of plant states over time. Given the current plant state or an estimate thereof, or a measurement vector that conveys information about the plant state, it is desired to generate a sequence of control signal outputs such that the cost-to-go is substantially minimized. The cost is typically a combination of terms that reflect the cost of generating control output actions, and the cost (or benefit, with a minus sign applied) of the plant state approaching or reaching a desired target state (or succession of such target states).
A classical method for treating optimal estimation and control problems is the Kalman filter or extended Kalman filter. The Kalman filter (KF) was first described in R. E. Kalman, Trans. ASME, Series D, Journal of Basic Engineering, Vol. 82 (1960), pp. 35-45. It solves the problems of optimal estimation and control when the plant and measurement processes satisfy certain conditions of linearity. The KF method also assumes knowledge of several types of parameters. It assumes that the deterministic portions of both the plant evolution (over time) process, and the measurement process (giving the relationship between the plant state and the measurement vector), are known; and that the noise covariances of both the plant evolution and measurement processes are also known. When the linearity condition is not strictly satisfied, the extended Kalman filter method (EKF) can be used to generate a linearized approximation of the plant and measurement processes. The EKF may be iteratively applied to obtain a sequence of linearized approximations. When the parameters of the plant evolution and measurement processes are not known, one can use a separate computation, “system identification”, to determine or estimate these parameters.
Kalman Optimal Estimation and Control
The state of an external system (the “plant”) governed by linear dynamical equations is described by a vector xt, which evolves in discrete time according toxt+1=Fxt+But+mt  (1)where the matrix F describes the noise-free evolution of x, m denotes an additive plant noise vector, u denotes an optional vector that may be generated as a control output signal by a controller system, and matrix B describes the effect of u on the plant state x. Sensors measure a linear function of the system state, with additive measurement noise n:yt=Hxt+nt.  (2)In classical Kalman estimation (e.g., the above-cited reference), it is assumed that the matrices H and F are known, and that the covariance matrices Q=Cov(m) of the plant noise, and R=Cov(n) of the measurement noise, are also known. Here E( . . . ) denotes expectation value, Cov(z)≡E[(z− z)(z− z)′], z is the mean of z, the prime symbol denotes matrix transpose, I will denote the identity matrix, and n and m are assumed to have zero mean.Kalman Estimation
The goal of the optimal estimation process is to generate an optimal estimate of xτ given a set of measurements {y1, y2, . . . , yt}. If τ is greater than, equal to, or less than t, the estimation process is referred to as prediction, filtering, or smoothing, respectively. An “optimal” estimate is one that minimizes a specified measure of the error between the estimated and the true plant state. Typically, this measure is the mean square error (MSE) of the estimated with respect to the true state.
Further notation and definitions are as follows. The a priori state estimate is {circumflex over (x)}t−≡{circumflex over (x)}(t|y1, . . . , yt−1), where the right-hand side denotes the estimated value of xt given the values of {y1, . . . , yt}. The a posteriori state estimate is {circumflex over (x)}t≡{circumflex over (x)}(t|y1, . . . , yt). The expressionηt−≡(yt−H{circumflex over (x)}t−)  (3)is called the measurement “innovation” or “residual”, and is the difference between the actual and predicted (in the case that τ=t+1) measurements at time t. The a priori and a posteriori state estimate errors are, respectively, ξt−=xt−{circumflex over (x)}t− and ξt=xt−{circumflex over (x)}t, and the covariances of these respective errors are pt−=Cov(ξt−) and pt=Cov(ξt). Since the estimation algorithm does not have access to the actual state xt, pt− and pt are not directly known. They are iteratively estimated by the algorithm below, starting from arbitrary initial values; these estimates at time t are denoted by (capital) Pt− and Pt respectively.
Kalman's solution for the optimal filter (i.e., the case τ=t) is then described by the following procedure: Given arbitrary initial values for {circumflex over (x)}0 and P0 (P0 is however typically initialized to be a symmetric positive-definite matrix), iteratively compute for t=1, 2, . . . :{circumflex over (x)}t−=F{circumflex over (x)}t−1+But−1  (4)Pt−=FPt−1F′+Q  (5)Kt=Pt−H′(HPt−H′+R)−1  (6){circumflex over (x)}t={circumflex over (x)}t−+Ktηt−={circumflex over (x)}t−+Kt(yt−H{circumflex over (x)}t−)  (7)Pt=(I−KtH)Pt−(I−KtH)′+KtRK′t.  (8)
Combining Eqs. 5 and 8 yieldsPt+1−=FPt−F′+Q−FKt(HPt−H′+R)K′tF′.  (9)This iterative solution may be regarded as consisting of two parts: an execution step during which a new estimate is computed (Eqs. 4 and 7) using an estimation function (here, Kt or (I−KtH), which are related to Pt−); and an updating step during which a new estimation function is generated (Eqs. 6 and 9) in terms of an estimation function at a previous time.
The solution for the optimal prediction problem with τ=t+1 is given by the same equations, using Eq. 4 to generate the optimal prediction {circumflex over (x)}t+1− given measurements through time t.
One extension of classical Kalman filtering, as noted above, is the extended Kalman filter (EKF) method. This can be used when the linearity conditions above are not strictly satisfied. EKF can be used to generate a linearized approximation of the plant and measurement processes, and may be iteratively applied to obtain a sequence of linearized approximations. The resulting solution is not strictly optimal in general, but only approximately so. The Kalman equations can also be generalized by allowing one or more of the matrices H, F, R, and Q to be functions of t. Also, when the parameters of the plant evolution and measurement processes are not known, one can perform a separate computation, “system identification”, to determine or estimate these parameters.
Kalman Control
We turn now to optimal Kalman control. The stochastic linear quadratic Gaussian (LQG) control problem can be described as follows, in the case of “full observability” (i.e., where the plant state vector xt at each time is assumed known). From a current plant state xtcurr at time tcurr<N, it is desired to optimize a set of control output signals {utcurr, utcurr+1, . . . , uN−1}. The trajectory of the plant state—that is, the sequence of states {xtcurr+1, . . . , xN}—will depend on the choice of the u values, on the initial state xtcurr, and on the noise terms {mtcurr, . . . , mN−1}.
The quantity to be optimized is called the “cost-to-go”, and has the following form. The cost-to-go of the trajectory comprises a control-action cost u′tgut and a target-deviation cost x′trxt for each time step t from tcurr to N, where both g and r are specified symmetric positive-definite matrices. Thus the cost-to-go, denoted J, is
                    J        =                                            ∑                              t                =                tcurr                                            N                -                1                                      ⁢                                                  ⁢                          (                                                                    u                    t                    ′                                    ⁢                                      gu                    t                                                  +                                                      x                    t                    ′                                    ⁢                                      rx                    t                                                              )                                +                                    x              N              ′                        ⁢                                          rx                N                            .                                                          (        10        )            
The optimization problem consists of determining the set of values of u for which the value of J, averaged over initial position xtcurr and plant noise, is a minimum. The solution has the form ut=−Ltxt, where L is the Kalman control matrix at time t.
In one extension of the above formulation, at least one of the matrices g and r may be functions of t. Also, the cost-to-go function may include additional cross-terms involving both x and u. See, for example, the University of Lund (Sweden) online lecture notes at http://www.control.lth.se/˜kursdr/lectures/f13LQsynth.pdf. Also, the cost-to-go may be described in terms of a desired target state xtarg, in which case each xt in Equation 10 should be replaced by xt−xtarg. In that case, the target state may itself vary with time, as for example in the case that the goal is to generate control signals so that the plant state xt may approximately follow a desired target trajectory.
In certain special cases, the cost-to-go may be formulated in such a way that the optimal matrices Lt are independent of time. This is referred to as a “stationary” optimal control problem. In one such case, the final time t=N is considered to be effectively infinite. In another such case [see, e.g., Szita and Lörincz, Neural Computation, vol. 16 (2004), pp. 491-499], the final time t=N is considered to be indeterminate; i.e., if the cost-to-go terms have not terminated by an arbitrary time step t, they are assumed to terminate at the next time step t+1 with a given constant probability.
For the optimal control problem stated above, the classical Kalman solution is described by the following procedure, where S is an auxiliary matrix that is a function of t. Starting with SN=r, iteratively compute for the decreasing sequence of time index values t=N, N−1, . . . , tcurr:Lt=(B′StB+g)−1B′StF  (11)St−1=F′StF+r−L′t(B′StB+g)Lt.  (12)Then, for t=tcurr and (optionally) for later t, use the Kalman Lt matrices to compute the control vectors ut:ut=−Ltxt  (13)
This process generates the set of optimal control output vectors {utcurr, . . . , uN−1}. As in the case of estimation, but with a key difference (regarding the order of computation of functions having different values of the time index), this iterative solution may be regarded as consisting of two parts: an execution step during which a new control vector is computed (Eq. 13) using a control function (here, Lt, which is related to St); and an updating step during which a new control function is generated (Eqs. 11 and 12) in terms of a control function that is associated with a later time index.
Note that, as observed by Kalman, the solutions to the optimal estimation and control problems are mathematically “dual” to one another. Equations 11, 13, and 12 are related by this duality to Equations 6, 7, and 9 of the Kalman filter solution, respectively. As noted above, however, the S and L matrices of the control problem are iteratively computed from later to earlier time index values (i.e., “backward in time”), whereas the P− and K matrices of the filtering problem are computed “forward in time”. The Kalman filter matrix K can therefore be computed at time t and immediately applied to determine the optimal a posteriori state estimate {circumflex over (x)}t, whereas the Kalman control matrices Lt must first be computed for a sequence of decreasing time index values, before they can be applied at the current time t=tcurr and then at later times.
Artificial Neural Networks
An artificial neural network (ANN) is characterized by processor nodes and connections between them, each node and connection typically being capable of performing only relatively simple computations, wherein the behavior of the nodes and connections is described by parameter values which either may be fixed or may be modifiable according to a specified learning or update rule. Some of the parameters describing the connections are referred to as connection “weights” or “strengths”. An ANN may be implemented either in hardware, or as software that is run on a general-purpose computer. A processor node is a computation unit (in hardware) or a simulation of such a unit (in software), wherein at least one input value is transformed into at least one output value. Typically the node computes, as its output, a specified function (which may be nonlinear) of the sum of its input values. Each input value or “activity” is typically described by a numerical value that may be time-varying. Time may be continuous-valued or may consist of discrete time steps. In a hardware implementation, an activity may be represented by any of a variety of signals, e.g., a level of, or change in level of, a current, voltage, or other physical quantity, wherein the magnitude, timing, and/or repetitive nature of the level or change carries the information. A processor node may be an internal node of the network (connected only to other nodes of the network), or may be a sensor or input node (providing or transducing information from the external environment to the network), or may be an effector or output node (generating signals from the network that influence the external environment).
Each connection conveys an activity value from one processor to another. The connection may be passive (i.e., may just transmit the value unchanged from one processor to the other) or may be active (transform the value en route). In the latter case, the particular transformation is specified by at least one parameter. Typically the transformation consists of a simple multiplication of the activity value by the connection strength.
Neural network computations are often described in terms of the behavior of a set of related nodes and connections. Suppose that there are two sets of nodes, one set of which (the source nodes) provides input to the other set (the target nodes). An activity vector, for example z=(z1, z2, . . . , zn), at a given time represents the fact that the activity at node k is equal to zk at that time. (Note that here the superscripts denote index values, not exponents.) A matrix C refers to a set of connections from source to target nodes, where Ckj is the strength of the connection from node j to node k. In the case that each target node simply computes the sum of its weighted inputs, then a target activity vector z is related to a source activity vector v by the matrix equation z=Cv, which represents the set of equations zk=ΣjCkjvj. If the kth target node computes a nonlinear function gk of the sum of its inputs, then we have instead zk=gk(ΣjCkjvj).
In an ANN, connection weights may be “directed”, meaning that Cji and Cij correspond to different connections (from i to j and from j to i respectively), and (if both connections exist) have strengths that need not be equal; or “undirected”, meaning that the two strengths are required to be equal to each other.
A layered ANN is one in which the nodes are organized into two or more groupings (called layers, but in the present invention not implying any particular geometric arrangement). Within a layered network, the connections may be within (a) a single layer (lateral connections), (b) from a “lower” to a “higher” layer in a hierarchy (feedforward connections), and/or (c) from a “higher” to a “lower” layer (feedback or recurrent connections). A layered ANN with recurrent connections is also called a recurrent neural network (RNN).
Adjustable connection strengths, and other adjustable parameters, may be modified using a learning rule. Typically, a learning rule is “local”, meaning that the rule makes use only of activity values, connection strengths, and other parameters or state information that are available at the node or connection being modified. For example, a simple version of a “Hebbian” learning rule changes a strength Cji by an amount proportional to the product of the activities zi and zj at either end of the connection from i to j. In some cases, a learning rule may make use of certain global values or signals that are available at all, or a subset of, the nodes and connections. For example, in a “reinforcement learning” algorithm, the change in strength may be proportional to a quantity that represents the overall benefit or cost resulting from an output signal produced by the network at a previous time step.
Also, in some cases, a subset of corresponding connections may be considered to be coupled or “ganged” together, so that they all have the same connection strength, and are all modified in tandem by the average of the amounts by which each such connection would have been modified if they were not so coupled. For one example of a network computation using ganged connections, see S. Becker and G. Hinton, Nature, vol. 355, pp. 161-163 (1992). This may be done either to speed up a computation (as in the above reference), or to cause corresponding parts of a network to behave in a coordinated manner.
Other types of ANNs and learning rules have been described in the literature; for a good introduction, see J. Hertz, A. Krogh, and R. G. Palmer, Introduction to the Theory of Neural Computation, Addison-Wesley 1991. As one example, the information conveyed between nodes of an ANN may be carried by the precise timing of spikes (rapid changes in activity value) rather than by the numerical value of the activity itself. As a second example, another type of ANN is a “radial basis function” (RBF) network. The output of an RBF node decreases as a function of the distance between the set of inputs to the node (regarded as a vector) and a “prototype vector” that is stored as a set of parameters for that node; the prototype vector is typically modified according to an appropriate learning rule.
Use of ANNs in Estimation and Control Problems
Typically, an ANN is used to perform an estimation function (such as prediction) by a process of “supervised learning”, using a training set comprising both measurement values and the desired plant state values to be estimated. The network parameters (e.g., the connection strengths) are initialized, a set of measurement values is presented as input to the ANN, a set of output values is generated by the ANN, a measure of the error between the ANN's output and the desired output (the plant state at a given time) is computed, that error measure is used to modify the connection weights (and other adjustable parameters, if any) of the ANN according to a specified learning algorithm such as the “back-propagation” algorithm, and the process is iteratively repeated with a new set of inputs. The ANN's weights may thereby converge to values that cause the ANN to generate outputs whose error measure is sufficiently small.
To treat the control problem, an ANN is typically used to generate one or more control outputs whose values depend upon the measurement inputs and the ANN's weights; the control outputs act on the plant (or on a simulated model of the plant) to alter the plant state; a measurement (or computation) of the new plant state is used as the new input to the network; and a measure of the error between the measured plant state and the target plant state is computed and used to modify the ANN's weights in accordance with a learning algorithm.
Standard methods for training the weights of a recurrent neural network (RNN) based on time sequences of input values and desired output values include “real-time recurrent learning” and “back-propagation through time”, both of which are described in Hertzet al., ibid., pp. 182-186.
J. T. Lo, in U.S. Pat. Nos. 5,963,929 and 5,408,424, describes the use of an RNN for optimal or near-optimal filtering (a form of estimation). In the '929 patent, a “recursive neurofilter” is used to estimate a signal process with respect to an estimation error criterion, where at least one weight is a nonlinear weight and is adjusted during a separate training process. Also described is the use of a second “neurofilter” (a second circuit or algorithm), to produce an approximation of the statistical error of the estimates. In both the '929 and '424 patents, the neural network is trained using pairs of inputs—measurement data and the actual state of the plant—as discussed above. These pairs are used to train the connection strengths in a generic RNN (see col. 42 of the '424 patent specification). The patent relies on the general knowledge of “supervised learning” methods in the ANN field, regarding how to learn to match actual output with desired output, in order to train the weights. Note that information about the actual plant state is required for operation of this method. In practice, however, one may only have access to measurement data, and not to the plant state itself, preventing one from using this method. Note also that: the adjustable weights are kept fixed during operation of the network (that is, those weights are not being learned or adjusted while the filter is actually performing its estimation of a signal process); the second “neurofilter” is distinct from the first “neurofilter” and although Kalman estimation is discussed as an alternative estimation method, Kalman estimation is not involved in the networks described (i.e., the networks neither learn nor use Kalman estimation).
Neural networks have been used in conjunction with the Kalman estimation (also referred to as the Kalman filter, or KF) equations in several ways.
First, the KF or EKF equations have been used to compute how the weights in an ANN should be modified. The ANN weights to be determined are treated as the unknown parameter values in a system identification problem, sets of input values and desired output values are specified, and the KF or EKF equations are used to determine the ANN weights based on the sets of input and desired-output values. The equations are solved by means of conventional mathematical steps including matrix multiplication and inversion. That is, the weights are not computed or updated (learned) by means of a neural network. Instead, the weights are read out from the neural network, provided as input arguments to the KF or EKF algorithm (which is not a neural network algorithm), the updated weight values are then provided as outputs of the KF or EKF algorithm, and the updated weight values are then entered into the neural network as the new weight values for the next step of the computation. For examples of this combined usage of an ANN in conjunction with the classical KF or EKF equations, see: S. Haykin (ed.), Kalman Filtering and Neural Networks, Wiley-Interscience, 2001, and in particular the chapter by G. V. Puskorius and L. A. Feldkamp; S. Singhal and L. Wu, “Training Multilayer Perceptrons with the Extended Kalman Algorithm”, in D. S. Touretzky (ed.), Advances in Neural Information Processing Systems, vol. 1 (Morgan Kaufmann Publ., San Mateo Calif., 1989), pp. 133-140; R. J. Williams, “Training recurrent networks using the extended Kalman filter”, in Proceedings of the International Joint Conference on Neural Networks, June, Baltimore, Md., Vol. IV (1992), pp. 241-246; and I. Rivals and L. Personnaz, “A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models”, Neurocomputing, vol. 20 (1998), pp. 279-294.
Second, one can combine the output from a nonlinear ANN with that of a (non-neural) KF algorithm, to improve predictions when applied to a nonlinear plant process. See for example Klimasauskas et al., “Hybrid linear—neural network process control”, U.S. Pat. Nos. 6,278,962 and 5,877,954. This method involves two analyzers: a primary “data-derived” one that generates a primary output, and a secondary “error correction analyzer” that generates a predicted error output; then the two analyzers' outputs are summed. Here the NN weights are trained to minimize the difference between (a) the sum of the linear filter output and the NN output, and (b) the desired output (e.g., the actual signal vector). That is, the NN learns how the linear filter output differs from the actual signal (or the desired output), and attempts to compensate for that difference. Here the NN does not learn a Kalman filter (KF); the KF is used alongside the NN, and the two perform complementary tasks. The KF is not computed or learned by means of a neural network or a set of neural computations. A similar combination is used in Tresp et al., “Method and arrangement for the neural modelling of a dynamic system with non-linear stochastic behavior”, U.S. Pat. No. 6,272,480. This method combines the output of a nonlinear recurrent neural network (RNN), with an error measure (based on a linear-equation model of the system error) that is modeled using a KF, in order to alter the RNN. The KF equations are not implemented within a neural network. They are solved, e.g., using a conventional computer program, and their values are then used to adjust the neural network's behavior.
Third, an ANN learning rule has recently been described that is motivated by the KF equations, although it does not implement the learning of a KF within an ANN. See G. Szirtes, B. Póczos, and A. Lörincz, in Neurocomputing, vols. 65-66 (2005), pp. 349-355. In this method, the KF equations are first altered in several ways. One of these alterations replaces the matrix product HK, where H is as in Eq. 2 and K is the KF matrix, by the identity matrix I. This assumption that HK is approximately equal to I is, however, not valid in general, but only under special conditions. Another of these alterations neglects the off-diagonal terms in a matrix that is being updated. This alteration is stated to be justified when the estimated plant process satisfies an assumption of “independent component transformation”. For this assumption to hold, special preprocessing steps are in general required. Another alteration neglects a “self-excitatory” contribution. Yet another alteration introduces a random vector in order “to provide a conventional neuronal equation”, but this alteration incorrectly changes a significant term in the learning equation, so that the contribution of that term is averaged to zero. These alterations cause the resulting “learning rule” to fail to learn a KF matrix, even approximately. Furthermore, results displayed in that reference show a decreasing estimation error. However, the decrease shown is not the result of an approximation to a KF matrix having been learned. Similar results are obtained even when an arbitrary matrix K is used (instead of using an approximation to the KF matrix), with no learning at all taking place. For these reasons, this method does not implement the learning of a KF within an ANN.
A classic formulation of the optimal control problem is Bellman's method of dynamic programming, in which (at each time step) a system has a set of allowed transitions from each state to a set of other states. A cost is associated with each transition, and the problem is to determine an optimal or near-optimal control policy that governs which transition to choose when in each state, to minimize an overall cost function. A class of ANN algorithms, based on “temporal difference” (TD) learning (a form of reinforcement learning) and its extensions, has been developed to learn the transition costs and a control policy. For a reference on TD learning (and reinforcement learning more generally) see: R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, 1998.
For the more specialized case of a linear quadratic Gaussian (LQG) system, or for a linear quadratic (LQ) deterministic system, the optimal Kalman control (KC) solution is as discussed above. The general KC solution has not been implemented within an ANN. However, a solution of a particular specialized form of the LQG control problem, namely, the “stationary” optimal control problem discussed above, has been implemented using an ANN with a TD learning rule (Szita and Lörincz, loc. cit.). In this special case, as noted above, the cost-to-go terms are assumed to terminate at any future time step with a given constant probability.