This invention is concerned with the use of a recurrent neural network for processing a discretetime process to produce a discrete-time output process with respect to a performance criterion, even if the range of said discrete-time process or said discrete-time output process is necessarily large and/or necessarily keeps expanding in time during the operation of the artificial recurrent neural network. Here the range of a process at a time refers to the set of all possible values of the process up to and including the time. The performance criterion is selected in accordance with the purpose of said processing in such fields as signal/speech processing, system identification/control, communication, robotics, economics, geophysics, sonar/radar data processing, oceanography, time series prediction, financial market forecast, etc.
There are many good books on the applications of the recurrent neural networks (e.g., D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control, Van Nostrand Reinhold (1992), B. Kosko, editor, Neural Networks for Signal Processing, Prentice Hall (1992), D. P. Morgan and C. L. Scofield, Neural Networks and Speech Processing, Kluwer Academic Publishers (1991)), and E. Sanchez-Sinencio and C. Lau, editors, Artificial Neural Networks, IEEE Press (1992)). There are also a large number of research articles concerning the applications of recurrent neural networks, which can be found in journals (e.g., IEEE Transactions on Neural Networks, Neural Networks, and Neural Computation), and in Conference proceedings (e.g., Proceedings of the International Joint Conference on Neural Networks).
Many patent documents exist concerning the applications of recurrent neural networks (RNNs). Three that seem highly relevant to the present invention are as follows. In U.S. Pat. No. 5,003,490 to P. F. Castelaz and D. E. Mills, (1991), a multilayer perceptron with a sigmoid activation function and a tapped delay line for the input is used to classify input waveforms. In U.S. Pat. No. 5,150,323 (1992) to P. F. Castelaz, a multilayer perceptron with a sigmoid activation function and a couple of tapped delay lines for preprocessed inputs is used for in-band separation of a composite signal into its constituent signals. In U.S. Pat. No. 5,408,424 (1995) to James T. Lo, recurrent neural networks are used for optimal filtering.
There are two fundamental difficulties with applying RNNs (recurrent neural networks): First, when the range of the input process to an RNN or the range of the output process from the RNN is necessarily large as compared with a desirable level of accuracy of the RNN""s processing, the numbers of neurons and connections of the RNN, the size of the training data set, and the amount of training computation are usually required to be very large. (We recall that the range of a process at a time is defined to be the set of all possible values of the process up to and including the time.) For instance, if the desired level of accuracy in determining the 3-1-3 Euler angles (xcex8, xcfx86, "psgr") of the attitude of a satellite is to within 20 arcseconds, the range of these Euler angles, which is the set [xe2x88x92180xc2x0, 180xc2x0)3, is relatively large.
Second, if the range of the input process to an RNN or the range of the output process from the RNN keeps growing in time during its operation, the RNN, which has to be trained on a training data set consisting of time series of finite lengths, usually can not generalize beyond these finite lengths of time. For instance, in tracking the positions of airplanes using a Doppler radar system, the distance from the radar system to an airplane may grow, say, from 5 mile to 200 miles in 5000 seconds. If the desired level of tracking accuracy is to within 30 feet and the time series in the training data are 1000 seconds long, an RNN trained on these training data to use the radar measurements to track airplanes is expected not to have the desired level of accuracy after perhaps 1200 seconds.
To eliminate or alleviate these difficulties, a preprocessing of a process before it is input to an RNN or a postprocessing of the output process produced by the RNN is required. We call the process before the preprocessing the exogenous input process, and we call the process immediately after the postprocessing the outward output process in order to distinguish them from the input and output processes of the RNN, which are the processes that are immediately input to and output from the RNN respectively.
A simple and common way to reduce the range of a exogenous input process or to extend the range of its output process is scaling. We may use a squashing function to reduce (or squash) the range of an exogenous input process, and use an antisquashing function to extend (or antisquash) the range of an RNN""s output process. However, it has been observed in numerical examples that the scaling methods are usually not effective in overcoming either of the aforementioned two difficulties.
Therefore, an apparatus that is more effective than an RNN and an RNN with a scaling preprocessor and/or postprocessor is needed for processing an exogenous input process to produce a good outward output process with respect to a performance criterion, even if the range of said exogenous input process or that of said outward output process is necessarily large and/or necessarily keeps growing during the operation of said apparatus.
Recurrent neural networks (RNNs) have been applied in a large number of fields including system identification/control, signal/speech processing, communication, robotics, economics, oceanography, geophysics, radar/sonar data processing, target tracking, sensor fusion, active noise cancellation, and financial market forecast. However, if the range of an exogenous input process or the range of the outward output process of an RNN is necessariy large or necessarily expands in time, the sizes required of the RNN and the training data set may be very large, and the training of the RNN may be very difficult, especially if a good processing performance is needed. Furthermore, the resulting RNN may not be able to generalize beyond the finite time interval over which the training data is available.
To overcome these difficulties, an apparatus is disclosed here for processing an exogenous input process to produce a good outward output process with respect to a performance criterion, even if said exogenous input process or outward output process has a large or expanding range. The basic idea is the employment of range extenders and reducers in accordance with the teachings of this invention. Each of these range extenders and range reducers is a dynamic transformer and thus called a dynamic range transformer. While a range reducer transforms dynamically at least one component of an exogenous input process, a range extender transforms dynamically the outputs from at least one output neuron of an RNN.
In accordance with the teachings of the present invention, the mentioned apparatus is a neural system comprising an RNN and at least one dynamic range transformer, which is either a range reducer or a range extender, and at least one weight of said RNN is a nonlinear weight determined in a training of said neural system with respect to a training criterion that is defined to reflect the mentioned performance criterion.
Three types of range extender by estimate addition and two types of range reducer by estimate subtraction are disclosed as example dynamic range transformers. Other types of range extender and reducer are possible. Different range reducers and extenders may have different levels of effectiveness and different levels of computational cost. The selection of range extenders and reducers for a neural system is governed by the trade-off between effectiveness and computational cost.
Formulas for training a neural system comprising an RNN and at least one range reducer or a range extender of the disclosed types are provided in the form of pseudo computer programs.