The present invention relates to a circuit and method employing logical operations for enhancing the transfer function nonlinearities of pulse frequency encoded neurons.
The back propagation algorithm credited to D. E. Rumelhart et al. (1) is one of the most popular and successful neural network learning or adaptation algorithms for neural networks. The usual neural model use assumes a McCullough-Pitts form in which the input signal vector is applied to a linear weighting network that produces a value at its output representative of the vector dot product of the input signal vector and a weighting vector. The output of the dot product network is typically applied to an output or activation network with a nonlinear no-memory transfer function. The most desirable nonlinearity for a wide variety of applications takes on the form of a sigmoidal function.
The back propagation algorithm provides a method for adjusting the weights in the network by tracing back from output to input the various contributions to the global error made by each weight component of the weighting vector network. This requires knowledge of the form of the transfer function, e.g., sigmoidal function. (Note that "sigmoidal" is a descriptor for a class of nonlinearities that have two saturating (limiting) states with a smooth transition from one to the other). An analytical expression representing the sigmoid or alternatively, "in vivo" perturbations of input signals allows the small signal gain between the input and output of the nonlinearity to be determined. The small signal perturbation gain corresponds to the derivative of the output relative to the input signal. Knowledge of the gain together with the input signal and current weight values permits the estimation of the contribution of each weight to the error.
More significantly, the adaptation process is concerned with minimizing the error between the observed response to a given input vector and the desired or ideal response. Knowledge of the derivative of the function, the weights and input values are required in the algorithm in order to estimate the error contribution of each unadjusted weight.
In order to make a reasonable estimate of each error contribution, the activation function should be known, stable and differentiable. In addition, practice has shown that a sigmoidal characteristic is preferred.
The present invention relates to the generation of a sigmoidal type activation function in artificial neurons using pulse-frequency type of data encoding. Neurons using this type of data encoding may be classified as deterministic pulse-frequency and stochastic pulse-frequency encoded neurons.
A deterministic pulse-frequency neuron is described in the Tomlinson, Jr., U.S. Pat. No. 4,893,255 dated Jan. 9, 1990. Tomlinson describes a neuron in which the vector dot product of the input and reference weighting vector is made by pulse-width modulation techniques. Two dot products are formed: one for excitatory inputs and the second for inhibiting inputs. The pulse-width encoded elements of each dot-product are separately OR-ed to form a nonlinear dot-product and decoded as analog voltage levels. Each voltage level is separately applied to a voltage-to-frequency converter. The output of each converter is a deterministic pulse stream with a pulse rate corresponding to the input voltage level. At this point, two pulse streams obtain (one representative of the excitatory dot product and the other representative of the inhibitory dot product) from which an activation function, or what Tomlinson, Jr., calls a "squash" function, is to be generated. The two pulse trains are combined so that "if an excitatory and inhibitory spike both occur simultaneously the inhibitory spike causes the complete nullification of any output spike". This final pulse stream is an approximation representative of the desired output function, i.e., a pulse rate corresponding to the vector dot product of the input vector and the reference weighting vector modified by a sigmoidal activation function.
The paper entitled "Neural Network LSI Chip with On-chip Learning", Eguchi, A., et al., Proceedings of the International Conference on Neural Networks, Seattle, Wash., Jul. 8-12, 1991 is indicative of the current state of the art in stochastically encoded neural networks. Neurons of this type were used in the simulated data presented herein. The stochastically encoded pulse frequency neuron used boolean logic methods operating on stochastic Bernoulli type sequences rather than the pulse-width techniques and deterministic voltage-to-frequency generators of Tomlinson, Jr. However, the squash or activation function that results from either method is comparable, having the same limitations and drawbacks as will be revealed in the following detailed discussion. Consequently, the present invention is applicable to both stochastic and deterministic neural output signals for the purpose of generating a more ideal sigmoidal activation function.
An ideal sigmoidal activation function is typically represented to be approximately of the form ##EQU1## as shown in FIG. 1. The derivative of this function, illustrated in FIG. 2, is ##EQU2##
A typical stochastically encoded, pulse frequency neuron is shown in FIG. 3. Each neuron 20 has both an excitatory and inhibitory sets of inputs shown as the composite elements 21 and 22 respectively. The input signals are weighted by means of AND-gates, typified by elements 23 and 24, and a set of input weighting signals on weight vector input lines 30. The output of the two sets of AND-gates is applied to OR-gates 25 and 26. The output of the excitatory OR-gate 25 is intended to be a pulse train representative of the activation function for the excitatory inputs and may be expressed as EQU f(net+)=1-e.sup.-(net+) ( 3)
where net+ is the total number of pulses (spikes) being generated at the OR-gate inputs. f(net+) is the probability that any output of the OR-gate is a one and represents the upper half of the activation function. Similarly, the output of OR-gate 26 is intended to be representative of the lower half of the activation function and is expressed as EQU f(net-)=1-e.sup.-(net-) ( 4)
This inhibitory half-activation function signal is complemented by inverter 27 and then combined with the excitatory half-activation function by means of AND-gate 28. The pulse rate output of AND-gate 28 may be expressed as EQU f(net)=f(net+) (1-f(net-)) (5)
where f(net+) corresponds to the probability that any output pulse from OR-gate 25 is a one and (1-f(net-)), the probability that the output of OR-gate 26 is a zero. Thus, inverter 27 allows an excitatory pulse from OR-gate 25 to pass through AND-gate 28 only if no inhibitory pulse is present at the output of OR-gate 26. In this manner the complete activation function, f(net), is made available at output 29.
In order to gain insight into the behavior of f(net), substitute equations (3) and (4) into equation (5) thus yielding EQU f(net)=e-(net-) (1-e-(net+)) (6)
Also, note that net represents the linear sum EQU net=(net+)-(net-)
because the pulse frequency coding encodes negative and positive values separately as positive valued pulse rates. Significantly, this means that there are many ways to represent a number. Zero, for example may be represented by (net+)=(net-)=q where q.gtoreq.0. This feature will prove to have a significant impact on the behavior of the nonlinear device represented by FIG. 3.
Consider a simple example in which we have two input variable x.sub.1, and x.sub.2
Let EQU -1.gtoreq.x.sub.1 .gtoreq.+1 (7)
and EQU 0.gtoreq.x.sub.2 .gtoreq.1 (8)
Because of the negative range of variable x.sub.1, it must be represented by two absolute magnitude terms, x.sub.1.sup.+ and x.sub.1.sup.-. EQU x.sub.1 =x.sub.1.sup.+ -x.sub.1.sup.- ( 9)
where EQU 0.gtoreq.x.sub.1.sup.+ +.gtoreq.1 EQU 0&lt;x.ltoreq.1
And for consistency, let EQU x.sub.2 =x.sub.2.sup.+ ( 10)
where EQU 0.ltoreq.x.sub.2.sup.+ .ltoreq.1
Thus, the sum of x.sub.1 and x.sub.2, or net, may be expressed as EQU net=x.sub.1.sup.+ +x.sub.2.sup.+ -x.sub.1.sup.- ( 11)
Also, EQU (net +)=x.sub.1.sup.+ +x.sub.2.sup.+ ( 12)
and EQU (net -)=x.sub.1.sup.- ( 13)
In terms of FIG. 3, the variables x.sub.1.sup.+ and x.sub.2.sup.+ would be excitatory signals applied to inputs 21 while x.sub.1.sup.- would be an inhibitory signal applied to inputs 22.
Substituting equations 12 and 13 into equation 6 yields EQU f(net)=e.sup.-x.sbsp.1.sup.- (1-e.sup.-(x.sbsp.1.spsp.+.sup.+x.sbsp.2.spsp.+.sup.)) (14)
It should be noted that because equal valued x.sub.2 inputs may be expressed by different combinations of x.sub.2.sup.+ and x.sub.1.sup.-, the value of f(net) is not uniquely determine by the value of x.sub.1 and X.sub.2 alone. This is due to the failure of linear superpositioning caused by the nonlinearity of the neuron of FIG. 3.
FIG. 4 is an evaluation of f(net) for three cases wherein the inhibitory signal, x.sub.1.sup.-, is held constant as the value of net is varied. The range of net is from -1 to +2. The solid lines represent the locus of f(net) for x.sub.1 =0, 1/2 and 1. The dotted curves are drawn to suggest the envelope of extreme range of f(net) for 0.gtoreq.x.sub.1.sup.- .ltoreq.1.
The significance of FIG. 4 is that no single stable transfer characteristic between net and f(net) can be established without imposing unrealistic constraints on the inputs x.sub.1.sup.+, x.sub.2.sup.+ and x.sub.1.sup.-. Thus, from this simple example it may be seen that "squash" function of Tomlinson, Jr. as represented by equation 6 does not necessarily yield a sigmoidal characteristic but rather results in a non-uniquely defined function for all values of net except the extrema .+-.1.
It will be appreciated that in order to effectively use the method of back propagation which depends on the determination of derivatives, such as df(net)/d(net), a better sigmoidal transfer characteristic is desired.
The two input example above may not be representative for neural networks having a larger number of excitatory and inhibitory input signals. To demonstrate this, refer to FIG. 5 which summarizes the results of numerous simulations using different numbers of excitatory and inhibitory inputs for the value of net ranging from -1 to +2. The two numbers associated with each curve represent the number of excitatory and inhibitory inputs, e.g., the lower curve labelled (6+, 3-) represents 6 excitatory and 3 inhibitory inputs.
Also, unlike FIG. 4, each transfer characteristic is the average over all uniformly distributed combinations of excitatory and inhibitory inputs for any given value of net. FIG. 4 shows that even the averaged transfer characteristic may deviate substantially from a sigmoidal characteristic. As the number of positive and negative number increases from (1+, 1-) to (6+, 3-) the average characteristic changes from sigmoidal to increasingly exponential like form. This is due to the functional asymmetry between excitatory and inhibitory pulses in the network: at any particular instant, a single inhibitory pulse at the input may nullify any number of simultaneously applied excitatory pulses. As the number of inputs increases, there is an increasing proportion of possible inputs that sum to a given value of net and that contain at least one inhibitory pulse at a given instant. Thus, as this proportion increases, the expected (average) value of the activation function, for a given value of net, decreases due to the increasing number of nullifications occurring within the pulse stream. One object of the present invention is to correct for this deleterious effect.
In addition to the fact that FIG. 5 is a set of average values, the actual simulation used to obtain this data, which do not rely upon the theoretical assumptions that led to the derivation of f(net) in equation (6) and FIG. 4, has a maximum value of f(net)=1 for the maximum value of net=2. However, the theoretical model for deriving equation 6 assumes an infinite number of inputs so that f(net)=1 when net increases indefinitely.
The present invention is also related to a copending application Ser. No. 07/673,804 entitled "A Circuit Employing Logical Gates for Calculating Activation Function on Derivatives on Stochasically-Encoded Signals" filed Mar. 22, 1991 by the same inventors and assigned to the same assignee.