The neural computing paradigm is characterized as a dynamic and highly parallel computationally intensive system typically consisting of input weight multiplications, product summation, neural state calculations, and complete connectivity among the neurons.
Most artificial neural systems (ANS) in commercial use are modeled on von Neumann computers. This allows the processing algorithms to be easily changed and different network structures implemented, but at a cost of slow execution rates for even the most modestly sized network. As a consequence, some parallel structures supporting neural networks have been developed in which the processing elements emulate the operation of neurons to the extent required by the system model and may deviate from present knowledge of actual neuron functioning to suit the application.
An example of the typical computational tasks required by a neural network processing element may be represented by a subset of the full Parallel Distributed Processing model described by D. E. Rumelhart, J. L. McClelland, and the PDP Research Group, Parallel Distributed Processing Vol. 1: Foundations, Cambridge, Mass., MIT Press, 1986. A network of such processing elements, or neurons, is described in J. J. Hopfield, "neurons With Graded Response Have Collective Computational Properties Like Those of Two-State Neurons," Proceedings of the National Academy of Sciences 81, pp. 3088-3092, May 1984. This processing unit as illustrated in FIG. 1 and Table 1 of FIG. 32.
Referring to FIG. 1, neural network processing unit, or neuron 40, typically includes processing tasks, including input function I.sub.i 44 and activity function Y.sub.i 42, and connectivity network 46, 48 which, in the worst case, connects each such neuron to every other neuron including itself.
Activity function Y.sub.i 42 may be a nonlinear function of the type referred to as a sigmoid function. Other examples of activity function Y.sub.i 42 include threshold functions, probabilistic functions, and so forth. A network of such nonlinear sigmoid processing elements 40 represents a dynamic system which can be simulated on a digital processor. From a mathematical perspective, nonlinear dynamic models of neurons can be digitally simulated by taking the derivative of the nonlinear equations governing the neurons functions with respect to time and then using numerical differentiation techniques to compute the function. This mathematical basis allows mapping the nonlinear continuous functions of neural networks onto digital representations. In discrete time steps, input function I.sub.i multiplies digital weight values W.sub.ij by digital signal values, Y.sub.j, on each neuron input and then form a sum of these product's digital values. The input to the activity function Y.sub.i is the output I.sub.i, and its output, in this case, is activity function Y.sub.i directly; alternatively, the output could be some function Y.sub.i.
The accuracy of the nonlinear digital simulation of a neural network depends upon the precision of the weights, neuron values, product, sum of product, and activity values, and the size of the time step utilized for simulation. The precision required for a particular simulation is problem dependent. The time step size can be treated as a multiplication factor incorporated into the activation function. The neurons in a network may all possess the same functions, but this is not required.
Neurons modeled on a neural processor may be simulated in a "direct" and/or a "virtual" implementation. In a direct method, each neuron has a physical processing element (PE) available which may operate simultaneously in parallel with the other neuron PE's active in the system. In a "virtual" implementation, multiple neurons are assigned to individual hardware processing elements (PE's), which requires that a PE's processing be shared across its "virtual" neurons. The performance of the network will be greater under the "direct" approach but most prior art artificial neural systems utilize the "virtual" neuron concept, due architecture and technology limitations.
Two major problems in a "direct" implementation of neural networks are the interconnection network between neurons and the computational speed of a neuron function. First, in an artificial neural system with a large number of neurons (processing units, or PE's), the method of connecting the PE's becomes critical to performance as well as cost. In a physical implementation of such direct systems, complete connectivity is a requirement difficult if not impossible to achieve due to the very large number of interconnection lines required. Second, the neural processing load includes a massive number of parallel computations which must be done for the "weighting" of the input signals to each neuron.
The relatively large size of the neural processing load can be illustrated with respect to a 64.times.64 element Hopfield network (supra), completely Connected with symmetrical weights. Such a network has 64.times.64=4,096 neurons which, for a fully interconnected network, has 4096.times.4096 or approximately 16.times.10.sup.6 weight values. A 128.times.128 element Hopfield network has 128.times.128 =16,384 neurons with 256.times.10.sup.6 weights. A sum of the weights times neuron input values across all neurons provides the input to each neuron's activation function, such as the sigmoid activation function previously described. Each computation contributes to the overall processing load which must be completed for all neurons every updating cycle of the network.
One structure for implementing neural computers is a ring systolic array. A systolic array is a network of processors which rhythmically compute and pass data through a system. One example of a systolic array for implementing a neural computer is the pipelined array architecture described by S. Y. Kung and J. N. Hwang, "A Unified Systolic Architecture for Artificial Neural Networks," Journal of Parallel and Distributed Computing 6, pp. 358-387, 1989, and illustrated in FIG. 2 and Table 2 of FIG. 33. In this structure each PE 50, 52, 54 is treated as a neuron, labeled Y.sub.i. Each neuron contains the weight storage 51, 53 , . . . , 55 for that neuron with the weights stored in a circular shifted order which corresponds to the j.sup.th neuron values as they are linearly shifted from PE to PE. Assuming the initial neuron values and weights have been preloaded into PEs 50, 52 . . . , 54 from a host, the network update cycle computes the I.sub.i (steps 1 through 7) and Y.sub.i (step 8) values, as shown in Table 2. In this fashion a neural network can be modeled on a systolic array.
The ring systolic array architecture (FIG. 2 and Table 2) has the following performance characteristics assuming overlapped operations: EQU SYSTOLIC RING period=N.delta..sub.M +.delta..sub.A +.delta..sub.bus +.delta..sub.S ( 1)
where the following delay variables are used, representing the delay through each named element:
.delta..sub.M =Multiplier delay. PA1 .delta..sub.A =Communicating Adder: 2-1 add stage delay. PA1 .delta..sub.S =Sigmoid generator delay. PA1 .delta..sub.bus =bus Communicating Adder: communications bypass stage delay.
and N represents the total number of neurons.
It is an object of this invention to provide an improved array processor apparatus and method.
It is a further object of this invention to provide an improved neural system architecture and method.
It is a further object of this invention to provide an artificial neural system which provides improved direct modeling of large neural networks.
It is a further object of this invention to provide an improved interconnection network for simplifying the physical complexity of a neural array characterized by total connectivity.
It is a further object of this invention to provide an improved neural array architecture and method adapted for efficient distribution over a plurality of interconnected semi-conductor chips.