The task of learning by a machine a pattern sequence which is a linear function of multiple inputs is a central problem in many technical fields including adaptive control and estimation, signal processing, artificial intelligence, pattern recognition, and neural networking. The machine must perform responsive tracking of the pattern sequence in real time while achieving fast convergence in a computationally efficient manner. Often the process of learning the pattern sequence is made more difficult in that very little prior knowledge of the system generating the sequence is known. Moreover, while the inputs to the machine for learning the pattern may be identified, the relevance and weight of each input in affecting the output pattern sequence is usually not known.
Methods of determining the relevance of a particular input along with a specific weight are known. The weights are derived from a modifiable gain parameter. The gain parameter is modified based on the auto-correlation of the increments in the identified input. When the gain parameter is positively correlated with a certain average of the preceding input increments,the gain parameter is increased. Conversely if the input increments are negatively correlated the gain parameter is decreased. The gain parameters are adjusted to enhance the efficiency and responsiveness of the learning process.
Prior techniques for adapting the gain parameter of an adaptive learning process have been disclosed by Kesten in "Accelerated Stochastic Approximation", Annals of Mathematical Studies, Vol 29, 1958,pp 41-59. The Kesten method reduces gain parameters or moves them along a fixed schedule converging to zero. The method can not find a gain level appropriate to the dynamics of a non-stationary task and is limited to a single gain parameter for the entire system.
A method entitled Delta-Bar-Delta (DBD) for accelerating convergence of neural networks is disclosed by Jacobs in "Increased Rates of Convergence Through Learning Rate Adaptation", Neural Networks, vol. 1, 1988, pp 295-307, by Chan et al. in "An Adaptive Training Algorithm for Back Propagation Networks", Cambridge University Engineering Department Technical Report, CUED/F-INFENG/TR.2, 1987, by Tollenaere in "SuperSAB: Fast Adaptive Back Propagation with Good Scaling Properties", Neural Networks, vol. 3, 1990, pp. 561-573, by Devos et al. in "Self Adaptive Back Propagation", Proceedings NeuroNimes, 1988, EZ, Nanterre, France, and by Lee et al. in "Practical Characteristics of Neural Network and Conventional Pattern Classifiers on Artificial and Speech Problems", Advances in Neural Information Processing Systems, vol. 2, 1990, pp 168-177. These DBD methods do not operate incrementally and are not dynamic. The methods modify the gain parameters after a complete pass through the training set and thus can not be applied to an on-line learning situation.
Classical estimation methods including the Kalman filter, Least-Squares methods, Least-Mean-Squares (LMS), and normalized LMS are described by Goodwin et al. in Adaptive Filtering Prediction and Control, Prentice Hall, 1984. These methods can be divided into classes with differing disadvantages. The Kalman filter method offers optimal performance in terms of tracking error, but requires more detailed knowledge of the task domain than is usually available. In particular, it requires complete knowledge of the statistics of the unknown system's time variation. The least-squares methods requires less such knowledge, but does not perform as well. In addition, both of these methods require a great deal of memory and computation. If the primary learning process has N parameters, then the complexity of these methods is of the order of N.sup.2. That is, their memory and computational requirements increase with the square of the number of parameters being estimated. In many applications this number is very large, making these methods undesirable. The LMS and Normalized LMS methods are much less complex, requiring memory and computation that is only of order N. However, these methods have slow convergence.
Thus it is desirable to discover a method of machine learning that achieves fast convergence and has responsive tracking of a pattern sequence without excessive computation, system knowledge, or intervention in a real time system.