The invention relates generally to a computing memory. The invention relates further to a computing system.
In von-Neumann machines a central processor (CPU=central processing unit or GPU=graphics processing unit) employs several mechanisms to overcome the so called “Memory Wall”, which is a term to denote the growing performance gap between ever faster processors and comparably slower memory technologies. These mechanisms are in particular focused on tolerating longer access latencies of the main memory system (with the latency expressed in processor cycles) in order to minimize the time that the processor's execution units are stalled, or in other words, to maximize the utilization of the execution unit(s).
One of the most important features of these mechanisms is the use of a memory hierarchy comprised of multiple levels of fast caches. Other mechanisms include support for out-of-order execution of instructions and multi-threading which both allow to continue processing with different instructions and/or threads when certain instructions or threads have been stalled while waiting for data to arrive from the memory system.
Another example of a mechanism to reduce the (average) access latency is a prefetching of data from the memory system.
The above-mentioned techniques were disclosed in a time when the processor and memory system designs were not limited by power. Furthermore, the focus was mainly at maximizing the execution pipeline utilization by reducing the memory access latency. As a result, these mechanisms are typically among the most power-hungry components of a computer system, also wasting a considerable amount of memory bandwidth. For example, if the processor only needs a single byte, still a complete cache line may be retrieved from the memory system from which the remaining bytes are not used. The same applies to the prefetching of data that is typically only partially processed, if at all. Both cases do not only waste memory bandwidth, but also waste power for unneeded data accesses and operations.
There are several disclosures related to an active memory device and related memory access.
U.S. Pat. No. 8,713,335 B2 discloses a parallel processing computing system which includes an ordered set of m memory banks and a processor core. The ordered set of m memory banks includes a first and a last memory bank, wherein m is an integer greater than 1. The processor core implements n virtual processors, a pipeline having p ordered stages, including a memory operation stage, and a virtual processor selector function.
Document US 2014/0149759 A1 discloses a process including multiple cores each to independently execute instructions and a power control unit (PCU) coupled to the cores to control power consumption of the processor. In turn, the PCU includes a controller logic to cause the processor to re-enter a first package low-power state responsive to expiration of an inter-arrival timer, where the expiration indicates that the time duration, subsequent to a transaction received in the processor, has occurred.
However, there may be a need to improve power efficiency of a processor/memory system. Furthermore, there may be a need to overcome the “Memory Wall” problem.