1. Field of the Invention
This invention relates to an heuristic processor, i.e. a digital processor designed to estimate unknown results by an empirical self-learning approach based on knowledge of prior results.
2. Discussion of Prior Art
Heuristic digital processors are not known per se in the prior art, although there has been considerable interest in the field for many years. Such a processor is required to address problems for which no explicit mathematical formalism exists to permit emulation by an array of digital arithmetic circuits. A typical problem is the recognition of human speech, where it is required to deduce an implied message from speech which is subject to distortion by noise and the personal characteristics of the speaker. In such a problem, it will be known that a particular set of sound sequences will correspond to a set of messages, but the mathematical relationship between any sound sequence and the related message will be unknown. Under these circumstances, there is no direct method of discerning an unknown message from a new sound sequence.
The approach to solving problems lacking known mathematical formalisms has in the past involved use of a general purpose computer programmed in accordance with a self-learning algorithm. One form of algorithm is the so-called linear perceptron model. This model employs what may be referred to as training information from which the computer "learns", and on the basis of which it subsequently predicts. The information comprises "training data" sets and "training answer" sets to which the training data sets respectively correspond in accordance with the unknown transformation. The linear perceptron model involves forming differently weighted linear combinations of the training data values in a set to form an output result set. The result set is then compared with the corresponding training answer set to produce error values. The model can be envisaged as a layer of input nodes broadcasting data via varying strength (weighted) connections to a layer of summing output nodes. The model incorporates an algorithm to operate on the error values and provide corrected weighting parameters which (it is hoped) reduce the error values. This procedure is carried out for each of the training data and corresponding training answer sets, after which the error values should become small indicating convergence.
At this point data for which there are no known answers are input to the computer, which generates predicted results on the basis of the weighting scheme it has built up during the training procedure. It can be shown mathematically that this approach is valid and yields convergent results for problems where the unknown transformation is linear. The approach is described in Chapter 8 of "Parallel Distributed Processing Vol. 1: Foundations", pages 318-322, D. E. Rumelhart, J. L. McClelland, MIT Press 1986.
For problems involving unknown nonlinear transformations, the linear perceptron model produces results which are quite wrong. A convenient test for such a model is the EX-OR problem, i.e. that of producing an output map of a logical exclusive-OR function. The linear perceptron model has been shown to be entirely inappropriate for the EX-OR problem because the latter is known to be nonlinear. In general, nonlinear problems are considerably more important than linear problems.
In an attempt to treat nonlinear problems, the linear perceptron model has been modified to introduce nonlinear transformations and at least one additional layer of nodes referred to as a hidden layer. This provides the nonlinear multilayer perceptron model. It may be considered as a layer of input nodes broadcasting data via varying strength (weighted) connections to a layer of internal or "hidden" summing nodes, the hidden nodes in turn broadcasting their sums to a layer of output nodes via varying strength connections once more. (More complex versions may incorporate a plurality of successive hidden layers.) Nonlinear transformations may be performed at any one or more layers. A typical transformation involves computing the hyperbolic tangent of the input to a layer. Apart from these one or more transformations, the procedure is similar to the linear equivalent. Errors between training results and training answers are employed to recompute weighting factors applied to inputs to the hidden and output layers of the perceptron. The disadvantages of the nonlinear perceptron approach are that there is no guarantee that convergence is obtainable, and that where convergence is obtainable that it will occur in a reasonable length of computer time. The computer programme may well converge on a false minimum remote from a realistic solution to the weight determination problem. Moreover, convergence takes an unpredictable length of computer time, anything from minutes to many hours. It may be necessary to pass many thousands of training data sets through the computer model.