Given the number n, a network of elements made according to the present invention with appropriate connections to n logical signals and their complements could, in theory, be constructed in such a way that any one of the 2.sup.2.spsp.n boolean functions of n variables would be attained for at least one state. The resulting machine would thus be a universal function-learner for n variables. However, the synthesis of any desired boolean function of n variables by a universal function-learner can already be accomplished by the Generalized Self-Synthesizer of P. H. Halpern, U.S. Pat. No. 3,262,101 issued July 19, 1966. For this task, then, the present invention, although applicable, provides no improvement in principle. Furthermore, if the number of variables n is very large, in the hundreds or thousands, any universal function synthesizer whatsoever, including of course one made according to the present invention or Halpern's, is physically impossible to build, due to the great number of states and therefore parts (at least 2.sup.n) required.
Fortunately, many useful functions of a large number n of variables are of very low sensitivity compared to most boolean functions of n variables. Sensitivity of a boolean function f can be precisely defined by a modulus of sensitivity function .mu..sub.f which for an interger d .gtoreq. 0 has a value .mu..sub.f (d) equal to the probability that f(a) is not equal to f(b) under the condition that a and b vary over all n-tuples which differ in precisely d components. The probabilities are calculated with respect to some probability distribution P given on the set of all n-tuples. Explicitly: .mu..sub.f (d) is equal to the sum of P(a)P(b) over all pairs (a, b) such that a and b differ in precisely d components and such that f(a) is not equal to f(b), divided by the sum of P(a)P( b) over all pairs (a, b) such that a and b differ in precisely d components; .mu..sub.f (d) is undefined if the latter sum is zero. A function f for which .mu..sub.f (d) is "small" when d is "small" will be called "insensitive" or "of low sensitivity". An example of an insensitive function is obtained by considering the classification of handwritten numerals represented by n = 144 logical values in a 12 .times. 12 array into two classes, say the "six" class and the "non-six" class. A few values changing in the array will not usually change the correct classification of a character. A theoretical treatment of sensitivity is to be found in the following paper, but it is not entirely adequate for the purposes of the present invention: G. V. Bochmann and W. W. Armstrong: Properties of boolean functions with a tree decomposition, BIT 14 (1974), pp. 1-13.
For the synthesis of insensitive functions of many variables, machines of a different nature than Halpern's are required. Previous devices which have been proposed, or could be used, include the well-known Perceptron of F. Rosenblatt (see M. Minsky, S. Papert: Perceptrons. An introduction to computational geometry. M.I.T. Press, 1969), the networks of Artrons of R. J. Lee, U.S. Pat. No. 3,327,291 issued June 20, 1967, the Slam networks of I. Aleksander (see I. Aleksander: Some psychological properties of digital learning nets. International Journal of Man-Machine Studies 2, 189-212, 1970), and the trainable digital apparatus of W. Armstrong, U.S. Pat. No. 3,613,084 issued Oct. 12, 1971. The present invention has advantages when compared to all of the aforementioned systems.
The central part of a Perceptron, the part wherein leaning occurs, is capable of realizing a certain class of so-called linearly separable functions, and there exists a Perceptron convergence theorem which states that certain training algorithms will lead to a state in which the output signal is always the specified response to the input signals. A fundamental difficulty with the Perceptron system is that the class of linearly separable functions of n variables is extremely small, and the power of these systems must be augmented by finding task-specific transformations of the data before they are applied to the function-learner. The present invention, however, needs no such external augmentation (except, of course, to convert input signals into n-tuples of logical signals), since some machine made according to the present invention could, in principle, synthesize any n-variable function whatsoever. The use of several layers of linear-threshold learning devices to attain a larger class of functions has not been very successful up to the present time since no satisfactory training algorithm was known for such networks. The present invention does represent a solution to this problem, though, for a very special kind of linear threshold element.
The present invention has advantages when compared to the adaptive logic networks of Lee, Aleksander, and Armstrong aforementioned. These advantages concern both the capacities of learning and of insensitive extrapolation (generalization). The restriction of the class of boolean functions of two variables realizable by the elements of a binary tree network to precisely the four nonconstant increasing functions of two variables is crucial. These functions are AND(x,y) = xy, OR(x,y) = x+y, LEFT(x,y) = x, and RIGHT(x,y) = y. It is the increasing nature of the functions which equates the output signal of an element in the network to the output of the network whenever that element is responsible, i.e. whenever changing its output would change the network output. A convergence theorem states that a synthesis of a specified function will be obtained by means of a certain algorithm for assigning these four functions to nodes of a binary tree provided (a) a synthesis of the function exists and (b) the components of the n-tuples are stochastically independent under the distribution P. This theorem is proved in the paper: W. W. Armstrong, G. V. Bochmann: A convergence theorem for logical network adaptation. Publ. No. 95, Department d'Informatique, Universite de Montreal, 1972. The said algorithm is not sufficiently powerful for practical applications, and it is therefore replaced herein by a statistical procedure based on the new concept of "heuristic responsibility" of an element when a certain n-tuple is input to the network. The implementation of this concept enables elements of a generally tree-like network to be specialized to the learning of a function which is a restriction of the specified function to a certain subset of n-tuples of input signals emanating from the control unit. This specialization generally becomes more pronounced as training progresses and has the effect of permitting efficient use to be made of every part of the given, fixed, network of elements.
The advantage of the present invention in respect to extrapolation comes from the theoretically and practically demonstrable fact that the functions realized by assignments of AND, OR, LEFT, and RIGHT to the elements in a large balanced binary tree are, averaged over the ensemble of all such assignments, extremely insensitive. For example, in such a tree with L layers of elements, the probability of change of the output signal when uniformly distributed input signals are perturbed is upper bounded by 12/(3L+13), no matter how many of the n=2.sup.L input signals are inverted ! Training of such a network using a partially specified boolean function of n variables is an attempt to select from said ensemble of assignments one which is in conformity with the constraints imposed by the training data. If far fewer than 2.sup.n constraints on function values are imposed, it may be expected that a synthesis so found will yield an insensitive function. Of course, if all 2.sup.n function values are specified in training, the possible syntheses all have the modulus of sensitivity of the specified function, which need not be small at all.
A recognition of the importance of insensitivity is lacking in previous inventions of adaptive logic networks, including those of the present inventor. Claims are therein made for greater effectiveness if the elements have an increased number of inputs or are each capable of acting as a universal logic element. Unfortunately, such elements yield more sensitive ensembles of functions and may be useless for extrapolation.