Industrial chip designs are moving towards chip multi-processors (CMPs). In comparison to high frequency uniprocessors, CMPs provide improved performance and reduced power consumption. CMPs use relatively simple cores and rely on thread level parallelism (TLP) to improve performance. Applications running on CMPs have to increase TLP to efficiently utilize the core count.
Increasing TLP, however, also increases memory level parallelism (MLP) by increasing the number of outstanding memory requests per clock. In addition, the data working set of the multi-threaded applications will likely grow with the thread count. Finally, increasing TLP is also likely to increase the randomness of accesses to the shared caches since accesses from several threads will be interleaved. Thus, memory bandwidth and cache capacity should scale with core count to support the increased MLP and data footprint.
Recently, three dimensional (3D) die stacking has been proposed as a viable option for stacking a dense memory die (such as DRAM) on a microprocessor die. Stacking allows disparate Si technologies to be combined in a die stack, without concern for technology integration into a single process flow. Stacking also provides a very-high bandwidth interface between the die using through-silicon-vias. Thus, 3D stacking of memory (such as DRAM) on CMPs may effectively address the memory hierarchy hurdles for CMP scaling.
DRAMs typically keep an open row activated until another row is required to reduce access latency. This technique is called an open-page policy and it works best if successive accesses to the DRAM are to the same open page. Otherwise, bank closes and precharge penalties significantly increase the total latency of accessing a bank. Traditionally, there is only one open page per bank in DRAM designs.