A common computer processing task involves sequentially processing large numbers of data items, such as data corresponding to each of a large number of pixels in an array. Processing data in this manner normally requires fetching each item of data from a memory device, performing a mathematical or logical calculation on that data, and then returning the processed data to the memory device. Performing such processing tasks at high speed is greatly facilitated by a high data bandwidth between the processor and the memory devices. The data bandwidth between a processor and a memory device is proportional to the width of a data path between the processor and the memory device and the frequency at which the data are clocked between the processor and the memory device. Therefore, increasing either of these parameters will increase the data bandwidth between the processor and memory device, and hence the rate at which data can be processed.
A memory device having its own processing resource is known as an active memory device. Conventional active memory devices have been provided for mainframe computers in the form of discrete memory devices provided with dedicated processing resources. However, it is now possible to fabricate a memory device, particularly a dynamic random access memory (“DRAM”) device, and one or more processors on a single integrated circuit chip. Single chip active memory devices have several advantageous properties. First, the data path between the DRAM device and the processor can be made very wide to provide a high data bandwidth between the DRAM device and the processor. In contrast, the data path between a discrete DRAM device and a processor is normally limited by constraints on the size of external data buses. Further, because the DRAM device and the processor are on the same chip, the speed at which data can be clocked between the DRAM device and the processor can be relatively high, which also maximizes data bandwidth. The cost of an active memory device fabricated on a single chip can is also less than the cost of a discrete memory device coupled to an external processor.
Although a wide data path can provide significant benefits, actually realizing these benefits requires that the processing bandwidth of the processor be high enough to keep up with the high bandwidth of the wide data path. One technique for rapidly processing data provided through a wide data path is to perform parallel processing of the data. For example, the data can be processed by a large number of processing elements (“PEs”), each of which processes a respective group of the data bits. One type of parallel processor is known as a single instruction, multiple data (“SIMD”) processor. In a SIMD processor, a large number of PEs simultaneously receive the same instructions, but they each process separate data. The instructions are generally provided to the PE's by a suitable device, such as a microprocessor. The advantages of SIMD processing are simple control, efficient uses of available data bandwidth, and minimal logic hardware overhead.
An active memory device can be implemented by fabricating a large number of SIMD PEs and a DRAM on a single chip, and coupling each of the PEs to respective groups of columns of the DRAM. Instructions are provided to the PEs from an external device, such as a host microprocessor. The number of PE's included on the chip can be very large, thereby resulting in a massively parallel processor capable of processing vast amounts of data. However, this capability can be achieved only by providing instructions to the PEs at a rate that is fast enough to allow them to operate at their maximum speed.
One technique for providing instructions to the PEs is to supply high level commands to an processing array control unit (“ACU”), decode these commands in the ACU to generate PE microinstructions, and pass the PE microinstructions to the PEs in the array. However, the microinstructions provided to the PEs consist of a large number of bits so that each of the microinstructions can have any of a large number of possible values. For example if the PE microinstructions are 52 bits wide, then the instructions can have 4.5×1015 possible values. If a program memory for the ACU was used in a typical manner to store these microinstructions (in which a program memory stored a corresponding microinstruction at each address), the required size of the program memory, which is normally a random access memory (“RAM”), would be very large and inefficiently used.
There is therefore a need for a system and method for generating and then decoding SIMD PE microinstructions in a manner that allows a relatively small amount of circuitry to operate at a relatively high speed.