Research on a Parallel SIMD Simulation Workbench (PASSWORK) has demonstrated that multiple instruction multiple data (MIMD) vector machines can simulate a nearly full speed the global routing and bit-serial operations of commercially available single instruction multiple data (SIMD) machines. Hardware gather/scatter and vector register corner-turning are key to this kind of high performance SIMD computing on vector machines as disclosed in the pending Iobst U.S. patent application Ser. No. 533,233 and titled Apparatus for Performing a Bit Serial Orthogonal Transformation Instruction. In a direct comparison between vector machines and SIMD machines, the only other significant limits to SIMD performance are memory bandwidth and the multiple logical operations required for certain kinds of arithmetic, i.e. full add on a vector machine or tallies across the processors on a SIMD machine. Results of this research suggest that a good way to support both MIMD and SIMD computations on the same shared memory machine is to fold SIMD into conventional machines rather than design a completely new machine.
Even greater SIMD performance on conventional machines may be possible if processors and memories are integrated onto the same chip. More specifically, if one were to design a new kind of memory chip (a process-in-memory chip or PIM) that associates a single-bit processor with each column of a standard random access memory (RAM) integrated circuit (IC), the increase in SIMD performance might be several orders of magnitude. It should also be noted that this increase in performance should be possible without significant increases in electrical power, cooling and/or space requirements.
This basic idea breaks the non-Neumann bottleneck between a central processing unit (CPU) and memory by directly computing in the memory and allows a natural evolution from a conventional computing environment to a mixed MIMD/SIMD computing environment. Applications in this mixed computing environment are just now beginning to be explored.
The present invention relates to a PIM chip which combines memory and computation on the same integrated circuit that maximumizes instruction/data bandwidth between processors and memories by eliminating most of the need for input/output across data pins. The chip contains multiple single-bit computational processors that are all driven in parallel and encompasses processor counts from a few to possibly thousands on each chip. The chips are then put together into groups or systems of memory banks that enhance or replace existing memory subsystems in computers from personal computers to supercomputers.