This application pertains to the art of artificial intelligence networks and more particularly to the art of neural networks.
The invention is particularly applicable to pattern recognition systems and will be described with particular reference thereto, although it will be appreciated that the invention has broader application.
During the last five years, there have been significant new developments in the technology for processing of pattern formatted information.
Depending on the original intent of the development and the discipline of the researchers, these new developments are often referred to as parallel distributed processing, neural-nets or connectionist-nets. There are no sharp dividing lines and one common aspect of all such developments is interest in the structure and capabilities of networks of rather simple processors connected in manners reminiscent of human biological neural nets.
New developments were originally in the form of a few new algorithms and some promising demonstrations of the capabilities of this new approach towards processing pattern information.
Three algorithms are particularly important in that they have focused interest on three important and seemingly different task areas, these being:
(1) supervised learning of associated input/output pairs and subsequent recognition of further unassociated inputs, PA1 (2) unsupervised learning or clustering of a body of unlabeled input patterns on the basis of some metric (concept discovery), and finally PA1 (3) associative storage and retrieval of the original patterns or associated patterns, even if the recall cue is only a distorted version of one of the originally stored patterns.
A development by Rumelhart, Hinton, and Williams referred to as a feedforward semi-linear net based on back-propagation of error, is a prime example of algorithms which fall into the first category. In such a net, a series of processing nodes are non-linear, and links therebetween are linear. The nodes are arranged in a series of layers. A progression of nodes from lower layers to upper layers is implemented. An output from a node is multiplied by a weight, and fed forward to a summing junction at an input of a node at a subsequent upper layer. Each lower layer node is generally connected to all the nodes at the next higher layer.
Learning in such nets is accomplished by adjustment of the weights until a single set of weights is capable of transforming each and all of the training set input patterns into the appropriate associated output pattern. The net then has then "learned" the classifications and is desirably capable of classifying all other patterns in the domain of the training set. The procedure for adjusting the weights is also called the Generalized Delta Rule.
This type of net is useful under appropriate conditions. One limitation is that it is adequate only for input patterns with a small number of components. Otherwise learning is extremely slow, and sometimes convergence towards small system errors is simply not obtained. Increasing the number of nodes in the "hidden" layers or increasing the number of layers helps only to a point, after which performance deteriorates further. A type of "noise," hinders learning in such systems.
A higher order connectionist-net, based on a more complicated network structure and a more complex algorithm (the MetaGeneralized Delta Rule), has been demonstrated to result in higher learning rates.
Unsupervised learning algorithms are associated with the names of Grossberg and Carpenter, Kohonen, and Amari, although non-neural net algorithms such as ISODATA have been known for a long time and are used widely.
The Adaptive Resonance Theory (ART) networks of Grossberg and Carpenter are perhaps best viewed as aspects, and only aspects, of more generalized theories regarding human behavior. Insofar as its network algorithmic aspects are concerned, the idea is to organize a set of input patterns in accordance with how a pattern fits or does not fit a leader or prototype. However, as a pattern becomes accepted in a cluster, it in turn modifies the concept of that "prototype" somewhat. Two different versions of such nets exist. They are called ART 1 and ART 2 and are appropriate for discrete binary and continuous valued inputs respectively. This organizational step is important in any pattern information processing. It is interesting and important to note that the nets used in ART 1 and ART 2 and indeed in Kohonen's work are "flat" in the sense that there are no "hidden" layers. Such ART networks are often attributed to be susceptible to noise.
Another model, referred to as the Hopfield net is of limited practical use. The efficiency of storage is small, and at any storage level the incidence of errors may be high depending on the nature of the distortion in the input pattern. Some of the error characteristics can be helped through simulated "annealing" and limitations to storage capacity can be alleviated through use of "higher order" links, though at the cost of substantially increased computing burdens. In addition, the original Hopfield net is only auto-associative in the sense that a distorted pattern can only retrieve a corrected version of itself, but cannot retrieve another dissimilar pattern with which it had been "associated" during storage. This is a severe functional limitation. Tank and Hopfield, Kleinfeld, and others have attempted to deal with this associating a set of (Tij) links with any specific set of (tij) links so that onset of a specific cue pattern X'. causes the system to recognize it correctly as X and also to trigger an evolution of the net to X, an hetero-associated pattern.
Most of the present day neural net computing is being achieved with simulated parallel processing. The three above-noted types of algorithms and net implementations are, however, quite far apart.
However, significant pattern information processing tasks generally involve all three types of processing. Sets of complex patterns need to be stored in associated manners suitable for retrieval through cues; concepts for the basis of learning and information storage need to be learned (even inferred) through unsupervised self-organizing procedures; and finally "meaning" has to be given to the organized clusters and to the established associations through supervised learning.
The present invention contemplates a new and improved system of neural networks which overcomes all of the above-referred problems, and others, and provides a unified system for accomplishing neural networks which have heretofore been accomplished by independent, incompatible systems.