A computer system generally includes one or more processors, a memory and an input/output system. The memory stores data and instructions for processing the data. The processor(s) process the data in accordance with the instructions, and store the processed data in the memory. The input/output system facilitates loading of data and instructions into the system, and obtaining processed data from the system.
Most modern computer systems have been designed around a "von Neumann" paradigm, under which each processor has a program counter that identifies the location in the memory which contains the its (the processor's) next instruction. During execution of an instruction, the processor increments the program counter to identify the location of the next instruction to be processed. Processors in such a system may share data and instructions; however, to avoid interfering with each other in an undesirable manner, such systems are typically configured so that the processors process separate instruction streams, that is, separate series of instructions, and sometimes complex procedures are provided to ensure that processors' access to the data is orderly.
In Von Neumann machines instructions in one instruction stream are used to process data in a single data stream. Such machines are typically referred to as SISD (single instruction/single data) machines if they have one processor, or MIMD (multiple instruction/multiple data) machines if they have multiple processors. In a number of types of computations, such as processing of arrays of data, the same instruction stream may be used to process data in a number of data streams. For these computations, SISD machines would iteratively perform the same operation or series of operations on the data in each data stream. Recently, single instruction/multiple data (SIMD) machines have been developed which process the data in all of the data streams in parallel. Since SIMD machine process all of the data streams in parallel, such problems can be processed much more quickly than in SISD machines, and at lower cost than with MIMD machines providing the same degree of parallelism.
The aforementioned Hillis patents and Hillis, et al., patent application disclose an SIMD machine which includes a host computer, a micro-controller and an array of processing elements, each including a bit-serial processor and a memory. The hose computer, inter alia, generates commands which are transmitted to the micro-controller. In response to a command, the micro-controller transmits one or more SIMD instructions to the array, each SIMD instruction enabling all of the processing elements to perform the same operation in connection with data stored in the elements' memories.
The array disclosed in the Hillis patents and Hillis, et al., patent application also includes two communications mechanisms which facilitate transfer of data among the processing elements. One mechanism enables each processing element to selectively transmit data to one of its four nearest-neighbor processing elements. The second mechanism, a global router interconnecting integrated circuit chips housing the processing elements in a hypercube, enables any processing element to transmit data to any other processing element in the system. In the first mechanism, termed "NEWS" (for the North, East, West, and South directions in which a processing element may transmit data, if the processing elements are considered arranged in a two-dimensional array), the micro-controller enables all of the processing elements to transmit, and to receive, bit-serial data in unison, from the selected neighbor. More recently, arrays have been developed in which "NEWS"-type mechanisms facilitate transfer of data in unison among processing elements that are considered arranged in a three-dimensional array.
On the other hand, in the global router, the data is transmitted in the form of messages, with each message containing an address that identifies the processing element to receive the data. The micro-controller enables the processing elements to transmit messages, in bit serial format, through the global router in unison, and controls the timing of the global router, but it does not control the destination of the message, as it does in the NEWS mechanism. However, the address, and other message protocol information that may be transmitted in the information, represents overhead that reduces the rate at which data can be transmitted.
As noted above, the arrays disclosed in the Hillis patents and Hillis patent application include bit-serial processors. These processors process successive bits of data serially. More recently, processor arrays have been developed which, in addition to the bit-serial processors, also include co-processors which process data, in word-parallel format. Each of the co-processors is connected to a predetermined number of the bit-serial processors to form a processing node. The aforementioned Kahle, et al, patent application describes an arrangement for connecting such coprocessors in the array.