The present invention relates generally to integrated circuit memory devices and, more particularly, to an apparatus and method for implementing a memory array device with built in computational capability.
Existing computer designs typically provide a direct connection between the processor and its associated memory components. In conventional designs, data values are exchanged between the processor and the memory components, which contain load/store addresses and load/store data objects going in and out of the processor.
In order to improve the computational power of microprocessors, the processing element or arithmetic logic unit (ALU) may be positioned as close as possible to the source of the data (e.g., the memory array) so as to promote a high data bandwidth between the two structures. Thus, modem microprocessors commonly feature large capacity memories next to the ALU in the form of, for example, L1, L2 and other caches. Although this added memory improves performance, it also increases the die area and thus the cost of each microprocessor chip.
Other attempts at increasing the computational speed of a processing device involve placing one-bit SIMD (Single-Instruction Stream Multiple-Data Stream) processors within the memory circuitry, adjacent to the sense amplifiers in both SRAM (Static Random Access Memory) and DRAM (Dynamic Random Access Memory) arrays. However, for small memories, the overhead of this bit-wise ALU approach is high. In addition, the operands need to be read out one at a time, and only then can the result be computed in the ALU attached to the sense-amp.
Accordingly, it would be desirable to be able to implement structures that provide a memory array device with built in computational capability in a manner that does not require the use of a physical ALU circuit incorporated into the memory structure, and such that multiple operands may be selected together with the result ready in the same cycle as the operand fetch (read).