This invention is concerned with the problem of discrete-time optimal filtering, namely the problem of processing a discrete-time measurement process for the purpose of estimating a discrete-time signal process, even if the ranges of the measurement and/or signal processes are large.
In a standard formulation of the problem in the modern theory of optimal filtering, whether the ranges of the measurement and signal processes are large or not, the signal process and measurement process are described by the mathematical/statistical model: EQU x(t+1)=f(z(t),t)+G(x(t),t).xi.(t), x(0)=x.sub.0, (1) EQU y(t)=h(x(t),t)+.epsilon.(t), (2)
where x(t) is an n-dimensional stochastic process; y(t) is an m-dimensional stochastic process; x.sub.0 is a Gaussian random vector with mean x.sub.0 and covariance II.sub.0, .xi.(t) and .epsilon.(t) are respectively n.sub.1 -dimensional and m.sub.1 -dimensional Gaussian noise processes with zero means; x.sub.0, .xi.(t) and .epsilon.(t) have given joint probability distributions; and f(x, t), G(x, t) and h(x, t) are known functions with such appropriate dimensions and properties that (1) and (2) describe faithfully the evolutions of the signal and measurement. The problem of discrete-time optimal filtering is to design and make a discrete-time dynamic system that inputs y(t) and outputs an estimate x(t) of x(t) at each time t=1,2, . . . , T, which estimate minimizes a given estimation error criterion. Here T is a positive integer or infinity. The dynamic system is called an optimal filter with respect to the given estimation error criterion. The dynamic state of the optimal filter at a time t.sub.1 must carry the optimal conditional statistics given all the measurements y(t) that have been received up to and including the time t.sub.1 at the time so that at the next time t.sub.1 +1, the optimal filter will receive and process y(t.sub.1 +1) using the optimal conditional statistics from t.sub.1, and then produce the optimal estimate x(t.sub.1 +1). The most widely used estimation error criterion is the mean square error criterion, E[.parallel.x(t)-x(t).parallel..sup.2 ], where E and .parallel..multidot..parallel. denote the expectation and the Euclidean norm respectively. The estimate x(t) that minimizes this criterion is called the minimum variance estimate or the least-square estimate.
The most commonly used method of treating such a problem, whether the ranges of the measurement and signal processes are large or not, is the use of a Kalman filter (KF) or an extended Kalman filter (EKF). A detailed description of the KF and EKF (and some other approximate nonlinear filters) can be found in e.g., A. H. Jazwinski, Stochastic Processes and Filtering Theory, pp. 194-358, Academic Press (1970), and B. D. O. Anderson and J. B. Moore, Optimal Filtering, pp. 36-287, Prentice-Hall (1979). The KF and EKF have been applied to a wide range of areas including aircraft/ship inertial and aided-inertial navigation, spacecraft orbit determination, satellite attitude estimation, phase array radar tracking, nuclear power plant failure detection, power station control, oceanographic surveying, biomedical engineering, and process control. Many important papers on the application of the KF and EKF can be found in H. W. Sorenson, editor, Kalman Filtering: Theory and Application, IEEE Press (1985).
In the rare cases where f and h are linear functions of x(t) and G does not depend on x(t), the model, (1) and (2), is called the linear-Gaussian model. If the KF is used for a linear-Gaussian model, the resulting estimate x(t) is the minimum variance (or the least-squares) estimate. In most cases, however, the foregoing linearity conditions on f, h and G are not satisfied and the EKF is used. At each time point, the EKF, which is a suboptimal approximate filter, first linearizes f and G at the estimated value of x(t) and linearizes h at the predicted value of x(t+1). Then the EKF uses the KF equations to update the estimated value of x(t+1) and the predicted value of x(t+2) for the new measurement y(t+1). By iterating the linearization and estimation a certain number of times or until convergence at each time point, we have the so-called iterated EKF (IEKF). Since both the EKF and IEKF involve linearization, they are not optimal filters. In fact, when either the random driving term G(x(t)).xi.(t) in (1) or the random measurement noise .epsilon.(t) in (2) has such large variances and covariances that the aforementioned estimated value and predicted value of the signal are not very close to the true signal, and/or when the functions f, G and h are not very smooth, the linearization may be a poor approximation and the EKF as well as IEKF may yield poor estimates or even fail totally.
This shortcoming of the EKF and IEKF has motivated an enormous amount of work on nonlinear filtering in the past thirty years or so. But the results have been disappointing. With very few, if any, exceptions, the nonlinear filtering results have been confined to research papers and textbooks. This state of the art is reflected in V. Krishnan, Nonlinear Filtering and Smoothing: An Introduction to Martingales, Stochastic Integrals and Estimation, John Wiley & Sons (1984) and R. S. Liptser and A. N. Shiryayev, Statistics of Random Processes I: General Theory and II: Applications, Springer-Verlag (1977). The EKF and, to a much less extent, the IEKF remain as the standard filters for estimating stochastic signals. This 30-year failure is believed to be related to the methodology that has been used since R. E. Kalman derived the KF equations. The methodology is analysis. Starting with a mathematical/statistical model, the methodology searches for a solution consisting of analytic formulas and/or equations that describe the structures and determine the parameters of the filter.
Because of the inherent inaccuracies and frequent failures of the EKF and IEKF and the restrictive and unrealistic assumptions and prohibitive computational requirements of other existing filters, new filters are needed that will consistently yield a high degree of estimation accuracy vis-a-vis the information contained in the measurements about the signal and that can be applied in a large variety of real-world situations.
Recent years have seen a rapid growth in the development of artificial neural networks (ANNs), which are also known as connectionist models, parallel distributed processors, neuroprocessors, and neurocomputers. Being crude mathematical models of theorized mind and brain activity, ANNs exploit the massively parallel processing and distributed information representation properties that are believed to exist in a brain. A good introduction to ANNs can be found in R. Hecht-Nielsen, Neurocomputing, Addison-Wesley (1990) and J. Hertz, A. Krogh and R. G. Palmer, Introduction to the Theory of Neural Computation, Addison-Wesley (1991).
There is a large number of ANN paradigms such as Hopfield networks, high-order networks, counter-propagation networks, bidirectional associative memories, piecewise linear machines, neocognitrons, self-organizing feature maps, adaptive resonance theory networks, Boltzmann machines, multilayer perceptrons (MLPs), MLPs with various feedback structures, other recurrent neural network paradigms, etc. These and other ANN paradigms have been applied to systems control (e.g., D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control, Van Nostrand Reinhold (1992)), signal processing (e.g., B. Kosko, editor, Neural Networks for Signal Processing, Prentice Hall (1992)), speech processing (e.g., D. P. Morgan and C. L. Scofield, Neural Networks and Speech Processing, Kluwer Academic Publishers (1991)), and others (e.g., E. Sanchez-Sinencio and C. Lau, editors, Artificial Neural Networks, IEEE Press (1992)).
There are many patent documents concerning the applications of ANNs. The two that seem the most relevant to the present invention among them are mentioned as follows. In U.S. Pat. No. 5,003,490 to P. F. Castelaz and D. E. Mills, (1991), a multilayer perceptron with a sigmoid activation function and a tapped delay line for the input is used to classify input waveforms. In U.S. Pat. No. 5,150,323 (1992) to P. F. Castelaz, a multilayer perceptron with a sigmoid activation function and a couple of tapped delay lines for preprocessed inputs is used for in-band separation of a composite signal into its constituent signals.
There are many research articles concerning applications of ANNs, most of which can be found in the foregoing books, journals (e.g., IEEE Transactions on Neural Networks, Neural Networks, and Neural Computation), and Conference proceedings (e.g., Proceedings of the International Joint Conference on Neural Networks). Applications of two groups of the aforementioned neural network paradigms to optimal filtering have been reported in the open literature since 1989. The applications of the first group to optimal filtering were reported in S. I. Sudharsanan and M. K. Sundareshan, "Maximum A Posteriori State Estimation: A Neural Processing Algorithm," Proceedings of the 28th Conference on Decision and Control, pp. 1805-1806 (1989), and in Q. Sun, A. T. Alouani, T. R. Rice and J. E. Gray, "A Neural Network Computation Algorithm for Discrete-Time Linear System State Estimation," Proceedings of the 1992 International Joint Conference on Neural Networks, pp. I-443-458 (1992). The signal and measurement processes considered therein are described by the linear-Gaussian model, and the neural networks used are Hopfield networks with the neural activation function slightly modified in the first paper cited above. The connection weights and neuron biases for the network are determined by using the Kalman filter (KF) equations so that when the Hopfield network stabilizes at each time point, the stable state is the minimum variance estimate. The usefulness of the method is very limited, because it can only be applied to the linear-Gaussian model for which the KF equations are available, and the weights and biases of the Hopfield network need to be updated in the operation of the Hopfield network by other means, using the Kalman filter equations or their slight modification.
The applications of the second group of the aforementioned neural network paradigms to optimal filtering were reported in the open literature by J. P. DeCruyenaere and H. M. Hafez, "A Comparison Between Kalman Filters and Recurrent Neural Networks," Proceedings of the 1992 International Joint Conference on Neural Networks, pp. IV-247-251 (1992); J. T.-H. Lo, "Neural Network Approach to Optimal Filtering," invited paper presented at the First World Congress of Nonlinear Analysts, Tampa, Florida (1992); J. T.-H. Lo, "Optimal Filtering by Recurrent Neural Networks," Proceedings of the Thirtieth Annual Allerton Conference on Communication, Control and Computing, pp. 903-912 (1992); J. T.-H. Lo, "Synthetic Approach to Optimal Filtering," Proceedings of the 1992 International Simulation Technology Conference and 1992 Workshop on Neural Networks, pp. 475-481 (1992). The second group of the aforementioned neural network paradigms consists of multilayer perceptrons with feedbacks.
Through these publications, a new approach emerged in the open literature. As opposed to the analytic methodology used in the foregoing conventional filtering theory as well as the foregoing application of the first group of neural network paradigms to optimal filtering, the new approach is synthetic in nature. Signal and measurement realizations, which are generated by either computer simulation or actual experiment, are synthesized into a filter by training and testing at least one multilayer perceptron with some feedback structure until the filtering performance of such a multilayer perceptron (with the given feedback structure) with respect to the mean square error criterion is satisfactory or can not be significantly improved by increasing the size of the multilayer perceptron (with the given feedback structure), whichever comes first, and then selecting a trained multilayer perceptron (with the given feedback structure) as the filter, analyzing network size versus filtering accuracy to optimize the cost effectiveness.
The selected multilayer perceptron (with the given feedback structure) is a recursive filter optimal for its architecture (e.g. number of layers, number of neurons in each layer, types of feedback, etc.), with the lagged feedbacks carrying the optimal statistics at each time point. Above all, it was proven that multilayer perceptrons with appropriate feedback structures exist that approximate the optimal filter in performance with respect to the mean square error criterion to any desired degree of acuracy.
Because of the synthetic nature of the new approach, no such assumptions as the Markov property, Gaussian distribution, and additive noise are necessary in the approach. However, there is a fundamental requirement in the approach. Namely, the measurement process in the optimal filtering problem is required to stay in a bounded region. In theory, the requirement is always fulfilled, since all measurable quantities in the real world can always be contained in a bounded region sufficiently large. However, if the measurement process or the signal process or both keep growing such as in a typical filtering problem in satellite orbit determination and aircraft/ship navigation, for a multilayer perceptron with a feedback structure (MLPWFS) to have a sufficient valid domain to cover the range of measurements and to have a sufficient valid range to cover the range of signals, the sizes of the MLPWFS and the training data set must be large. The larger the MLPWFS and the training data set are, the more difficult it is to train the MLPWFS on the training data set.
Furthermore, the time period or periods, over which the training data is collected, by computer simulation or actual experiment, must be of finite length. If the measurement and signal processes keep growing, the MLPWFS trained on the training data has difficulty to generalize beyond the foregoing time period or periods.
A simple way to extend an MLPWFS output range and to reduce an input data range is scaling. We may multiply an MLPWFS output by a constant greater than one and/or divide an input by another constant also greater than one. Or alternatively, we may use a monotone increasing function to extend (or antisquash) an MLPWFS output and/or use another monotone increasing function to reduce (or squash) an input. However, scaling is a "static" method of extending and reducing a range, employing a static mapping to transform the range. It is not very effective in extending a bounded MLPWFS output range to an expanding signal process range, or reducing an expanding measurement process range to a bounded MLPWFS input domain. Consequently, its usefulness is limited, as borne out in our computer simulations.
Therefore, more effective methods and apparatuses to transform MLPWFS output ranges and input data ranges are needed, when the ranges of the signal and/or measurement processes are large and/or expanding.