A central processing unit (CPU)/processor of a computer fetches instructions in a sequence from a memory into the CPU, decodes the instruction, executes and performs a write back into the memory. The CPU makes use of various units like a fetch unit, a program counter, and memory interface units including registers and stacks. To make the above process of computer program instruction processing faster, the CPU architecture has been evolving. Processor architecture related features like instruction pipelining, branch prediction, superscalar processors, vector processors, multithread processing etc have been implemented successfully. Also high speed cache memories are being used which require separate control and related logic for operation. Use of multiple processors on a single chip allows the chip to process different instructions simultaneously.
In spite of all the architectural developments and implementations a large amount of the processing time is still spent in fetching instructions and waiting for memory to respond. There is a major mismatch between the speed at which the processor works and speed at which data/instruction can be accessed/fetched from memory-memory processing speed. The processor architecture related features improve processing speeds of computers but memory processing speed still lags behind by a huge margin.
To reduce this speed mismatch, various new processor architectures and approaches have been emerging. One approach involves combining logic with memory. Processors produced using this approach is described as IRAM—Intelligent Random Access Memory. (DRAM is preferred as it is denser than SRAM). IRAM processor approach reduces the speed mismatch as the processor and memory are on the same chip allowing improved access speeds. The IRAM architecture also has other advantages like higher bandwidth, lower access latency, energy efficiency and reduces memory space. Another approach called MPPA—Massively Parallel Process Array allowing multiple CPU's and RAM memories to be coupled/placed on to same chip. Each processor may run its own instruction set with the help of assigned memory. Variations of this idea have also been implemented in an architecture called “Processing In Memory” (PIM). In literature, “Active memories” is another name used for describing similar approaches.
The above approaches provide processors and associated local memory connected through fast interconnects. Tasks are scheduled on this processor based on the data availability within different local memories. One host processor generally has control over the computer operations. Because of the better integration technology of the processor and memory, memory access speed in increased. The main problem with the above approach is the significant software and tool chain changes required to support the current software programming models. Many of the above referred architectures specify their own software programming models, tools, programming languages, compilers, etc., which would render using currently prevailing models impractical.