This invention is concerned with the problem of discrete-time optimal filtering, namely the problem of processing a discrete-time measurement process for the purpose of estimating a discrete-time signal process.
In the standard formulation of the problem in the modern theory of optimal filtering, the signal process and measurement process are described by the mathematical/statistical model: EQU x(t+1)=f(x(t),t)+G(x(t),t).xi.(t),x(1)=x.sub.1, (1) EQU y(t)=h(x(t),t)+.epsilon.(t) (2)
where x(t) is an n-dimensional stochastic process; y(t) is an m-dimensional stochastic process; x.sub.1 is a Gaussian random vector, .xi.(t) and .epsilon.(t) are respectively n.sub.1 -dimensional and m.sub.1 -dimensional Gaussian noise processes with zero means; x.sub.1, .xi.(t) and .epsilon.(t) have given joint probability distributions; and f(x,t), G(x,t) and h(x,t) are known functions with such appropriate dimensions and properties that (1) and (2) describe faithfully the evolutions of the signal and measurement. The problem of discrete-time optimal filtering is to design and make a discrete-time dynamic system that inputs y(t) and outputs an estimate x(t) of x(t) at each time t=1,2, . . . , T, which estimate minimizes a given estimation error criterion. Here T is a positive integer or infinity. The dynamic system is called an optimal filter with respect to the given estimation error criterion. The dynamic state of the optimal filter at a time t.sub.1 must carry the optimal conditional statistics given all the measurements y(t) that have been received up to and including the time t.sub.1 at the time so that at the next time t.sub.1 +1, the optimal filter will receive and process y(t.sub.1 +1) using the optimal conditional statistics from t.sub.1, and then produce the optimal estimate x(t.sub.1 +1). The most widely used estimation error criterion is the mean square error criterion, E[.parallel.x(t)-x(t).parallel..sup.2 ], where E and .parallel...parallel. denote the expectation and the Euclidean norm respectively. The estimate x(t) that minimizes this criterion is called the minimum variance estimate or the least-square estimate.
The most commonly used method of treating such a problem is the use of a Kalman filter (KF) or an extended Kalman filter (EKF). A detailed description of the KF and EKF (and some other approximate nonlinear filters) can be found in e.g., A. H. Jazwinski, Stochastic Processes and Filtering Theory, pp. 194-358, Academic Press (1970), and B. D. O. Anderson and J. B. Moore, Optimal Filtering, pp. 36-287, Prentice-Hall (1979). The KF and EKF have been applied to a wide range of areas including aircraft/ship inertial and aided-inertial navigation, spacecraft orbit determination, satellite attitude estimation, phase array radar tracking, nuclear power plant failure detection, power station control, oceanographic surveying, biomedical engineering, and process control. Many important papers on the application of the KF and EKF can be found in H. W. Sorenson, editor, Kalman Filtering: Theory and Application, IEEE Press (1985).
In the rare cases where f and h are linear functions of x(t) and G does not depend on x(t), the model, (1) and (2), is called the linear-Gaussian model. If the KF is used for a linear-Gaussian model, the resulting estimate x(t) is the minimum variance (or the least-squares) estimate. In most cases, however, the foregoing linearity conditions on f, h and G are not satisfied and the EKF is used. At each time point, the EKF, which is a suboptimal approximate filter, first linearizes f and G at the estimated value of x(t) and linearizes h at the predicted value of x(t+1). Then the EKF uses the KF equations to update the estimated value of x(t+1) and the predicted value of x(t+2) for the new measurement y(t+1). By iterating the linearization and estimation a certain number of times or until convergence at each time point, we have the so-called iterated EKF (IEKF). Since both the EKF and IEKF involve linearization, they are not optimal filters. In fact, when either the random driving term G(x(t)).xi.(t) in (1) or the random measurement noise .epsilon.(t) in (2) has such large variances and covariances that the aforementioned estimated value and predicted value of the signal are not very close to the true signal, and/or when the functions f, G and h are not very smooth, the linearization may be a poor approximation and the EKF as well as IEKF may yield poor estimates or even fail totally.
This shortcoming of the EKF and IEKF has motivated an enormous amount of work on nonlinear filtering in the past thirty years or so. But the results have been disappointing. With very few, if any, exceptions, the nonlinear filtering results have been confined to research papers and textbooks. The EKF and, to a much less extent, the IEKF remain as the standard filters for estimating stochastic signals.
The 30-year failure is believed to be related to the methodology that has been used since R. E. Kalman derived the KF equations. The methodology is analysis. Starting with a mathematical/statistical model, the methodology searches for a solution consisting of analytic formulas and/or equations that describe the structures and determine the parameters of the filter. In the process of searching, deductive reasoning is used and many assumptions are made to make some special cases analytically tractable. In fact, the KF was derived under the assumptions that f and h are linear in x(t), G does not depend on x(t), and .xi.(t) and .epsilon.(t) are Gaussian sequences. The model, (1) and (2), contains such assumptions as the Markov property, Gaussian distribution, and additive measurement noise. When enough additional assumptions are made to derive explicit filter equations, these assumptions are usually so restrictive and/or unrealistic that they prevent the filter equations from much real-world application.
When not enough additional assumptions are made, the analysis involved is so deep and complicated that it leads mostly to mathematical formulas and equations that are not ready for designing or implementing a real filter. This state of the art is reflected in V. Krishnan, Nonlinear Filtering and Smoothing: An Introduction to Martingales, Stochastic Integrals and Estimation, John Wiley & Sons (1984) and R. S. Liptser and A. N. Shiryayev, Statistics of Random Processes I: General Theory and II: Applications, Springer-Verlag (1977). In the few cases where the assumptions are not so bad and the explicit filtering algorithms are available, these filtering algorithms involve such an enormous amount of computation that their real-time implementation is prohibitively expensive if not impossible. Some examples of such cases can be found in R. S. Bucy and K. D. Senne, "Digital Synthesis of Non-linear Filters," Automatica, Vol. 7, pp. 287-298, 1971, J. T. -H. Lo, "Optimal Estimation for the Satellite Attitude using Star Tracker Measurements," Automatica, Vol. 22, pp. 477-482, 1986, and J. T. -H. Lo and S. K. Ng, "Optimal Fourier-Hermite Expansion for Estimation," Stochastic Processes and Their Applications, Vol. 21, No. 2, pp. 21-35, 1987.
Because of the inherent inaccuracies and frequent failures of the EKF and IEKF and the restrictive and unrealistic assumptions and prohibitive computational requirements of other existing filters, new filters are needed that consistently yield a high degree of estimation accuracy vis-a-vis the information contained in the measurements about the signal and that can be applied in a large variety of real-world situations.
Recent years have seen a rapid growth in the development of artificial neural networks (ANNs), which are also known as connectionist models, parallel distributed processors, neuroprocessors, and neurocomputers. Being crude mathematical models of theorized mind and brain activity, ANNs exploit the massively parallel processing and distributed information representation properties that are believed to exist in a brain. A good introduction to ANNs can be found in R. Hecht-Nielsen, Neurocomputing, Addison-Wesley (1990) and J. Hertz, A. Krogh and R. G. Palmer, Introduction to the Theory of Neural Computation, Addison-Wesley (1991).
There is a large number of ANN paradigms such as Hopfield networks, high-order networks, counter-propagation networks, bidirectional associative memories, piecewise linear machines, neocognitrons, self-organizing feature maps, adaptive resonance theory networks, Boltzmann machines, multilayer perceptrons (MLPs), MLPs with various feedback structures, other recurrent neural network paradigms, etc. These and other ANN paradigms have been applied to systems control (e.g., D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control, Van Nostrand Reinhold (1992)), signal processing (e.g., B. Kosko, editor, Neural Networks for Signal Processing, Prentice Hall (1992)), Speech Processing (e.g., D. P. Morgan and C. L. Scofield, Neural Networks and Speech Processing, Kluwer Academic Publishers (1991)), and others (e.g., E. Sanchez-Sinencio and C. Lau, editors, Artificial Neural Networks, IEEE Press (1992)).
There are many research articles concerning applications of ANNs, most of which can be found in the foregoing books, journals (e.g., IEEE Transactions on Neural Networks, Neural Networks, and Neural Computation), and Conference proceedings (e.g., Proceedings of the International Joint Conference on Neural Networks). Application of one of the aforementioned neural network paradigms to optimal filtering was reported in S. I. Sudharsanan and M. K. Sundareshan, "Maximum A Posteriori State Estimation: A Neural Processing Algorithm," Proceedings of the 28th Conference on Decision and Control, pp. 1805-1806 (1989). The signal and measurement processes considered therein are described by the linear-Gaussian model and the neural network used is a Hopfield network with the neural activation function slightly modified. The connection weights and neuron biases for the network are determined by using the Kalman filter (KF) equations so that when the Hopfield network stabilizes at each time point, the stable state is the minimum variance estimate. The usefulness of the method is very limited, because it can only be applied to the linear-Gaussian model for which the KF equations are available and the weights and biases of the Hopfield network need to be updated in the operation of the Hopfield network by other means using the Kalman filter equations.
There are also many patent documents concerning the applications of ANNs. Only a couple that seem the most relevant to the present invention among them are mentioned as follows. In U.S. Pat. No. 5,003,490 to P. F. Castelaz and D. E. Mills, (1991), a multilayer perceptron with a sigmoid activation function and a tapped delay line for the input is used to classify input waveforms. In U.S. Pat. No. 5,150,323 (1992) to P. F. Castelaz, a multilayer perceptron with a sigmoid activation function and a couple of tapped delay lines for preprocessed inputs is used for in-band separation of a composite signal into its constituent signals.