A SIMD, or Single Instruction Multiple Data parallel computer typically comprises a control unit connected to a large number of identical processing elements, or PEs. Each PE typically includes an Arithmetic Logic Unit (ALU), a few data registers and a data memory. In operation, data is read from the data memory of each PE, operated upon by each ALU, and stored back in the data memory. These operations are controlled by the control unit. The data registers in each PE facilitate the manipulation of data. As distinguishable from other parallel computers, the PEs in SIMD parallel processors perform the same operation at the same time, but operate on different data.
The throughput of a SIMD parallel processor is determined by the number of PEs multiplied by the throughput of each individual PE. Typically to maximize throughput, a SIMD design is a compromise between the number of PEs and the complexity of each PE, as the faster PE will generally be more complex as well. A simple `figure of merit` for a particular SIMD design is determined according to the number of PEs that can be built on a particular integrated circuit (or `chip`) with the relative performance of each PE factored in. Generally, the more PEs per chip, the less costly the implementation for a given level of performance.
The typical commercially available SIMD parallel computers contain from 128 to 65,536 PEs, each of which typically have a 1-bit data paths and registers, operate at a relatively slow 5 or 10 Mhz cycle, and implement 3-72 PEs per chip.
SIMD parallel computers are frequently used to process image data from television cameras. However, the image data after being received from the video camera and digitized, is provided in a format generally orthogonal and incompatible with the format required by the PE data memories into which it must be written in order to be processed by each PE. Thus, to be practically useful, commercially available SIMD parallel computers solve this problem by including a separate hardware device for data re-formatting, called a "corner turner", which operates on the data externally from the PEs. However, this extra hardware is ultimately undesirable as it adds to the cost and complexity of a SIMD parallel computer.
One example of a commercially available SIMD parallel processor is the AIS-5000, manufactured by Applied Intelligent Systems of Ann Arbor, Mich. The AIS-5000 contains a total of 1,024 PEs and costs about $50,000. It implements 8 PEs with a group of two chips, yielding a simple figure of merit of 4 PEs per chip. One chip, which is a custom gate array, contains the ALUs and registers for 8 PEs and the other chip, which is an 8-bit wide commercially available memory chip, implements the memory associated with each PE. Therefore, the figure of merit for this machine is 4 PEs per chip. The basic machine cycle rate is 10 Mhz and it has 32,768 bits of memory per PE. The AIS-5000 contains 512 of these two-chip groups for a total of 1,024 chips. It has corner turning hardware built into the custom gate array chips.
Another example of a commercially available SIMD parallel computer is the CM-1, manufactured by Thinking Machines of Cambridge, Mass. The CM-1 contains a total of 65,536 PEs and costs about $3,000,000, and implements a group of 16 PEs with a group of 5 chips, yielding a simple figure of merit of 3.2 PEs per chip. One chip, which is a custom gate array, contains the ALUs and registers for 16 PEs, and the other four chips are commercially available memory chips and comprise the memory associated with the 16 PEs. The basic machine cycle rate is 5 Mhz; the machine has 4,096 bits of memory per PE. The CM-1 contains 4,096 of these 5- chip groups for a total of 20,480 chips.
A third representative example of a commercially available SIMD parallel computer is the NCR45SPDS SIMD Processor Development System, which is manufactured by NCR Corporation, Microelectronics Division, Fort Collins, Colo. It contains a maximum or 10,368 PEs. It is implemented with the NCR GAPP chip, which contains 72 PEs, implying a simple figure of merit of 72 PEs per chip. It has a cycle rate of 10 Mhz, but has only 128 bits of memory per PE, a small amount. It also has external corner turning hardware.