Recurrent networks are basically dynamic system simulators where the states evolve according to certain nonlinear state equations. Because of their dynamic nature, they have been successfully applied to many problems involving modeling and processing of temporal signals, such as adaptive control, forecasting, system identification, and speech recognition. Recurrent networks are used to simulate dynamic systems and are often implemented in computer software.
Recurrent networks generate outputs based on a plurality of inputs according to the formula:x(k+1)=ƒ[Wx(k)].In the above equation, x is a vector representing one or more outputs of the recurrent network. For simplicity of description x is used herein without subscripts. k is a time index associated with the output x. For example, x(2) refers to the value of the vector x that is determined from previous values of x (x(0) and x(1)). As a physical example, x may represent the most likely next purchase of a consumer. In such a case x(2) represents the most likely next purchase of a consumer based on the predicted values for the customer's first (x(0)) and second (x(1)) purchase. In the above equation, ƒ represents a function that is chosen to represent the dynamic system that is simulated, and W is a weight matrix that is “trained” to better predict values for x. As described in greater detail below, the inputs to the recurrent network are previous values of x. For example, in the above description x(0) and x(1) are used as inputs to predict x(2).
Despite the potential and capability of recurrent networks, the main problem recurrent networks present is proper training of the weight matrix. Many existing algorithms are complex and slow to converge. The training problem consists of adjusting the weights of the network such that the trajectories of its dynamic states have certain specified properties. Because a weight adjustment affects the values of x at all times during the course of network state evolution, obtaining the error gradient, or gradient with respect to the weights, W, of the difference between the predicted values, x, and actual measured values, represented herein as d, is a rather complicated procedure.
Some previous attempts at training recurrent networks have focused on computational methods for obtaining the gradient. These approaches require an excessive number of iterations to generate the desired weights.