The present invention is directed to a neural network and neural network processing elements which are based on probability and, more particularly, is directed to a network with a neural node which produces an output as a product of linearly transformed or power series expanded input signals.
A neural network consists of a multitude of massively, sometimes completely, interconnected processing elements. The interconnection of these processing elements may be structured, but this is not a necessary requirement for these arrangements to be referred to as neural networks. The processing elements can be organized into layers, columns, trees, rings, stars etc. dependent on the problem to be solved and the available resources All processing elements in a neural network need not be identical. This feature allows processing element configurations to be specialized to perform some specific functions within the network, such as input or output functions. Conventional neural networks 8 are formed from processing elements which deal with boolean signals and are generally structured in layers as illustrated in FIG. 1. The signals can represent the elements of a picture in a pattern recognition network, states of devices in a process control systems network and any other signals or values which are operated on by neural networks and expert systems. A neural network consists of several layers, an input layer 10 which includes plural input nodes 12, an output layer 14 which includes plural output nodes 16 and, in order to solve complex problems, that is, problems which are not linearly separable, the conventional neural network 8 usually includes one or more layers 18-22 between the input 10 and output 14 layers which, because of their location within the network, are often referred to as hidden layers. Each of the hidden layers also includes nodes 24 and 26. In principle, there can be more than one hidden layer in the network 8. Additional hidden layers increase the amount of serial processing in the network, but can clarify the resultant solution produced by the network 8. It has been theoretically shown that any arbitrary function can be formed or problem solved in a network with only a single hidden layer, therefore, the conventional three layer neural network has all the features necessary for developing fault tolerant solutions to any arbitrary problem. The conventional neural network 8 also includes a training layer 28 with training nodes 30 which allow the neural network 8 to learn the patterns which are to be recognized. Conventional neural network processing is simply based on taking the inner product of a weight vector and the input vector and testing this value against some threshold. FIG. 2 illustrates the conventional neural network processing element structure or node where this type of processing element is used in all layers of the network including the input and output layers. This node 40 includes a linkage element 42 or algorithm which produces a linkage output signal by multiplying the linkage input signal x.sub.i by a weight A.sub.i as illustrated in equation 1: EQU y.sub.i =A.sub.i x.sub.i ( 1)
The linkage weight A.sub.i can be either positive or negative depending on whether the connection between the processing elements is excitory or inhibitory. These linkage output signals y.sub.i are summed and compared to a threshold .THETA. as illustrated in equation 2: ##EQU1## This conventional algorithm allows one to reason with weighted information, a classical method of decision making. Such an algorithm is commonly referred to as a weighted summation and each node is sometimes called a perceptron because of its ability to perceive patterns. The weighted summation algorithm, by itself, is a statistical procedure usually used to consolidate a multitude of related raw data into a single datum. This process is very useful in performing multi-sensor fusion at the input layer (10) of a neural network. This process is also very useful in formulating a consensus from several decision making sources at the output layer (14) of a neural network. Since conventional neural network seal only with boolean signals, the result of this consolidation by weighted summation must be transformed back into a boolean signal. This is done in the above algorithm by comparing the weighted summation y with the threshold .THETA., and returning a 0 (false) if the inequality is invalid or a 1 (true) if the inequality is valid. The simple threshold function described above can be replaced with a complex threshold function such as a sigmoid function which operates on the output from the summation element 44 by the transfer function comparison element 46. During the training phase different back propagation of error training algorithms have been used by a weight calculation unit 48 to adjust the weights A.sub.i applied to the input signals of each node.
The function of equation 2 does not include an offset because an offset is redundant. Any offset can be simply absorbed into the threshold or threshold value .THETA. because of the presence of addition. A single summation threshold processing element cannot generate every possible boolean logic function. For example, for the two input case the summation threshold function can only generate 14 (the linearly separable functions) of the 16 possible functions and is incapable of generating either the Exclusive-OR or the Equivalence (Exclusive-NOR) functions, the non-linearly separable functions, as illustrated in Table 1, where Z is the set (false, true):
TABLE 1 __________________________________________________________________________ Summation Threshold Case of Two Inputs Z = {0,1} Z = {-1,1} Function A.sub.x A.sub.y .THETA. A.sub.x A.sub.y .THETA. __________________________________________________________________________ TRUE 0 0 -1 0 0 -1 FALSE 0 0 1 0 0 1 x 1 0 1 1 0 1 y 0 1 1 0 1 1 NOT x 1 0 0 -1 0 1 NOT y 0 1 0 0 -1 1 x AND y 1 1 2 1 1 1 (NOT x) AND y -1 1 1 -1 1 1 x AND (NOT y) 1 -1 1 1 -1 1 (NOT x) AND (NOT y) -1 -1 0 -1 -1 1 x OR y 1 1 1 1 1 -1 (NOT x) OR y -1 1 0 -1 1 -1 x OR (NOT y) 1 -1 0 1 -1 -1 (NOT x) OR (NOT y) -1 -1 -1 -1 -1 -1 x XOR y Impossible Impossible x EQV y Impossible Impossible __________________________________________________________________________
The inability of equation 2 to generate all possible boolean functions is not a practical problem when these elements are used in a network with a plurality of layers, since a combination of these elements is capable of generating any arbitrary logic function. This follows from the fact that a single summation threshold processing element can generate the universal logic element NOR, the universal logic element NAND and the complete set of key primitive boolean logic functions, AND, OR and NOT.
The number of boolean logic functions which a single conventional processing element can emulate grows rapidly with the number of inputs, as illustrated in the third column of Table 2:
TABLE 2 ______________________________________ A Comparison of the Number of Boolean Logic Functions for a General-Purpose Processing Element Versus a Perceptron Number of Outputs Number of General-Purpose Perceptron Inputs Processing Element Processing Element ______________________________________ 1 4 4 2 16 14 3 256 104 4 65,536 1882 5 2.sup.32 .about.4 .multidot. 10.sup.9 93,852 6 2.sup.64 .about.10.sup.19 .about.1.4 .multidot. 10.sup.7 ______________________________________
Table 2 compares the maximum possible number of boolean logic functions which a hypothetical general purpose processing element can generate with the actual number of boolean logic functions which a conventional node, or perceptron, can generate (the linearly separable functions), for 1 to 6 inputs. Table 2 illustrates that for greater than three inputs, the hypothetical general purpose processing element generates a substantially larger number of boolean logic functions (as indicated in the general purpose processing element column) than the conventional device (listed in the perceptron column) but does not necessarily generate all the boolean functions. This suggests that a more powerful processing element, such as a hypothetical general purpose processing element, for building compact, high speed neural networks would be one which is specifically tailored to generate non-linearly separable boolean logic functions.