This invention relates to neural networks or adaptive nonlinear filters which contain linear dynamics, or memory, embedded within the filter. In particular, this invention describes a new system and method by which such filters can be efficiently trained to process temporal data.
In problems concerning the emulation, control or post-processing of nonlinear dynamic systems, it is often the case that the exact system dynamics are difficult to model. A typical solution is to train the parameters of a nonlinear filter to perform the desired processing, based on a set of inputs and a set of desired outputs, termed the training signals. Since its discovery in 1985, backpropagation (BP) has emerged as the standard technique for training multi-layer adaptive filters to implement static functions, to operate on tapped-delay line inputs, and in recursive filters where the desired outputs of the filters are knownxe2x80x94[1, 2, 3, 4, 5]. The principle of static BP was extended to networks with embedded memory via backpropagation-through-time (BPTT) the principle of which has been used to train network parameters in feedback loops when components in the loop are modeled [6] or un-molded [7]. For the special case of finite impulse response (FIR) filters, of the type discussed in this paper, the BPTT algorithm has been further refined [8]. Like BP, BPTT is a steepest-descent method, but it accounts for the outputs of a layer in a filter continuing to propagate through a network for an extended length of time. Consequently, the algorithm updates network parameters according to the error they produce over the time spanned by the training data. In essence, BP and BPTT are steepest descent algorithms, applied successively to each layer in a nonlinear filter. It has been shown [9] that the steepest descent approach is locally H∞ optimal in prediction applications where training inputs vary at each weight update, or training epoch. However when the same training data is used for several epochs, BPTT is suboptimal, and techniques which generate updates closer to the Newton update direction (see section 10) are preferable. We will refer to such techniques, which generate updates closer to the Newton update direction, as Newton-like methods.
Since steepest-descent techniques such as BPTT often behave poorly in terms of convergence rates and error minimization, it is therefore an object of this invention to create a method by which Newton-like optimization techniques can be applied to nonlinear adaptive filters containing embedded memory for the purpose of processing temporal data. It is further an object of this invention to create an optimization technique which is better suited to training a FIR or IIR network to process temporal data than classical Newton-like [10] techniques. It is further an object of this invention to create multi-layer adaptive filters which are Taylor made for specific applications, and can be efficiently trained with the novel Newton-like algorithm.