A large class of problems, such as speech recognition, handwriting recognition, robotic control, function fitting and others, is difficult to solve or remain unsolved with conventional computing methods. They can, however, be cast in the form of pattern classification or optimization problems for which neural network methods have shown promise of solution.
A neural network is a type of computer or processor structure that resembles the structure of the human brain, in that data is processed in a multi-layered system of interconnected "nodes" or "neurons," each of which might be a set of memory cells or even a group of individual processors.
Conventional computers are programmed in a more or less completely non-adaptive manner, so that their ability to recognize patterns or a common structure in a data input stream is exclusively dependent on how expansive, detailed, and predictive its program is. In contrast, a neural network begins with interconnected nodes with biases, and it develops its own program through "training." Training normally involves presenting the network with a large number of training patterns with known values. The network's output is evaluated, and "mistakes" cause the network to adjust its internal parameters and interconnections in order to improve its performance. In other words, the network "learns," and its performance typically Will improve as it is "trained."
As an example, assume a neural network is to be trained to distinguish between digitized images representing the "A" and "B." In this case, the network has two outputs, namely, "A" and "B." When the network is presented with an image of an "A," it is to recognize this and activate the output "A." In order to train the network, a stream of "A" and "B" images is input to the network. For each input symbol, the network analyzes the input data and indicates whether it received an "A" or a "B." Every time the network decides that the input "A" is a "B," it is told that it has made a mistake, and it can then adjust the values of the neural connections and biases so that it will reduce the probability that it will make the same mistake again. In other words, a neural network uses a "feedback learning" procedure to adjust its internal evaluation parameters. Even for systems which only need to recognize well-defined or small input sets, known neural networks require long training times: very large numbers of training runs must be made before the network learns.
One of the foremost causes of long training times for existing neural networks is that the elements of the input set, that is, the group of different letters, entire words, sounds, pictures, symbols and other patterns or data, do not contain enough information about the proper values of the neural interconnections and biases to enable the network to make "good guesses." (This will be defined more precisely below.) In practice, this means that many neural networks are trained using random values for the interconnections and biases. Thousands of runs of thousands of different input symbols are not uncommon before conventional neural networks learn to recognize the input set with an acceptable degree of accuracy.
To make a human analogy, assume that a person is trying to learn German. If this beginner is presented with the words, "Stute," "Hengst," "Fohlen" and "Pferd," without further knowledge, she will have to analyze and look up each word one at a time and it will take a long time and many mistakes before she will have mastered them. The learning process, however, would be speeded up greatly if she were to know in advance that all these words refer to horses (i.e., "mare," "stallion," "foal," and "horse," respectively). When she later is confronted with the word "Wallach," if she is told that this word belongs to the same or to a very similar class of words (it means "gelding"), she will not make a large number of "wild guesses" before learning the new word. By "bounding" the input class, the learning process is much quicker.
One of the other major disadvantages of long learning times for neural networks is that it makes it more difficult or impossible for them to work in real time. If a slow-learning neural network encounters a symbol or pattern it does not recognize, there may not be enough time to retrain the network to incorporate the new symbol. Moreover, if it takes a neural network a long time to converge, that is, to decide which pattern it has before it, it may be too slow to be of practical use. A text recognition system that can only read two words per minute would be, of course, of limited usefulness in helping the blind to read books printed in, for example, type fonts which the neural networks have not previously encountered.
Furthermore, standard neural networks need more neurons and interconnections to learn more complicated problems. The requirements for memory and training time may therefore become prohibitive for very large-scale problems. Consequently, it is also important to make networks more efficient, that is, to use fewer nodes and interconnections. It is therefore a goal in the field of neural network design to increase the learning speed of the neural network, as well as to increase its accuracy.
Yet another shortcoming of existing networks is that when they are to recognize a new pattern (for example, a new type font) they haven't already been trained for, it is necessary to retrain them from scratch. Conventional networks are thus not "modular," in that they cannot establish proper weights and biases for new patterns separately from those already established for earlier training patterns.
Examples of developments in neural network research are found in "Neurocomputing Foundations of Research," edited by James A. Anderson and Edward Rosenfeld, "A Design For An Associative Spin Glass Processor," by James M. Goodwin, Bruce E. Rosen, and Jacques J. Vidal, and the associated U.S. Pat. No. 4,977,540, "Spin Glass Type Associative Processor System" (Goodwin, et al, Dec. 11, 1990), "Optical Neural Computers," by Yaser S. Abu-Mostafa and Demitri Psaltis, (Scientific American, March, 1987), U.S. Pat. No. 3,887,906, "Optical Associative Memory Using Complementary Magnetic Bubble Shift Registers" (Minnaja, Jun. 3, 1975), and "A Learning Algorithm for Boltzmann Machines," by David H. Ackley and Geoffrey E. Hinton (Cognitive Science, Vol. 9, pp. 147-169, 1985).
The object of this invention is to provide a neural network that requires a much shorter training time than existing neural networks, while maintaining the ability to find an optimal solution, to make a more efficient network, using fewer nodes and weights, and to make the network "modular," so that a new pattern can be learned without the need to retrain the network from scratch.