A common computer processing task involves sequentially processing large numbers of data items, such as data corresponding to each of a large number of pixels in an array. Processing data in this manner normally requires fetching each item of data from a memory device, performing a mathematical or logical calculation on that data, and then returning the processed data to the memory device. Performing such processing tasks at high speed is greatly facilitated by a high data bandwidth between the processor and the memory devices. The data bandwidth between a processor and a memory device is proportional to the width of a data path between the processor and the memory device and the frequency at which the data are clocked between the processor and the memory device. Therefore, increasing either of these parameters will increase the data bandwidth between the processor and memory device, and hence the rate at which data can be processed.
An active memory device is a memory device having its own processing resource. It is relatively easy to provide an active memory device with a wide data path, thereby achieving a high memory bandwidth. Conventional active memory devices have been provided for mainframe computers in the form of discrete memory devices having dedicated processing resources. However, it is now possible to fabricate a memory device, particularly a dynamic random access memory (“DRAM”) device, and one or more processors on a single integrated circuit chip. Single chip active memories have several advantageous properties. First, the data path between the DRAM device and the processor can be made very wide to provide a high data bandwidth between the DRAM device and the processor. In contrast, the data path between a discrete DRAM device and a processor is normally limited by constraints on the size of external data buses. Further, because the DRAM device and the processor are on the same chip, the speed at which data can be clocked between the DRAM device and the processor can be relatively high, which also maximizes data bandwidth. The cost of an active memory fabricated on a single chip can is also less than the cost of a discrete memory device coupled to an external processor.
An active memory device can be designed to operate at a very high speed by parallel processing data using a large number of processing elements (“PEs”) each of which processes a respective group of the data bits. One type of parallel processor is known as a single instruction, multiple data (“SIMD”) processor. In a SIMD processor, each of a large number of PEs simultaneously receive the same instructions, but they each process separate data. The instructions are generally provided to the PE's by a suitable device, such as a microprocessor. The advantages of SIMD processing are simple control, efficient use of available data bandwidth, and minimal logic hardware overhead. Another parallel processing architecture is multiple instruction, multiple data (“MIMD”) in which a large number of processing elements process separate data using separate instructions.
A high performance active memory device can be implemented by fabricating a large number of SIMD PEs or MIMD PEs and a DRAM on a single chip, and coupling each of the PEs to respective groups of columns of the DRAM. The instructions are provided to the PEs from an external device, such as a host microprocessor. The number of PE's included on the chip can be very large, thereby resulting in a massively parallel processor capable of processing vast amounts of data.
In operation, data to be operated on by the PEs are first written to the DRAM, generally from an external source such as a disk, network or input/output (“I/O”) device in a host computer system. In response to common instructions passed to all of the PEs, the PE's fetch respective groups of data to be operated on by the PEs, perform the operations called for by the instructions, and then pass data corresponding to the results of the operations back to the DRAM. After they have been written to the DRAM, the results data can be either coupled back to the external source or processed further in a subsequent operation. By operating on the data using active memory devices, particularly active memory devices using SIMD PEs and MIMD PEs, the data can be processed very efficiently. If the same data were operated on by a microprocessor or other central processing unit (“CPU”), it would be necessary to couple substantially smaller blocks of data from the memory device to the CPU for processing, and then write substantially smaller blocks of results data back to the memory device. The wider data bus and faster data transfer speeds made possible by using an active memory instead of a conventional memory result in a significantly higher data bandwidth.
Although an active memory device allows much more efficient processing of data stored in memory, the processing speed of a computer system using active memory devices is somewhat limited by the time required to transfer operand data to the active memory for processing and the time required to transfer results data from the active memory after the operand data has been processed. During such data transfer operations, active memory devices are essentially no more efficient than passive memory devices that also require data stored in the memory device to be transferred to and from an external device, such as a CPU.
There is therefore a need for a system and method for allowing data to be more efficiently transferred between active memory devices and an external system.