Off-chip memory bandwidth has been considered one of the major limiting factors to processor performance, especially for multi-cores and many-cores. Conventional processor design allocates a large portion of off-chip pins to deliver power, leaving a small number of pins for processor signal communication. Generally, a processor requires much less power than can be supplied during memory intensive stages. This is due to the fact that the frequencies of processor cores waiting for data to be fetched from off-chip memories can be scaled down in order to save power without degrading performance.
As memory-intensive applications such as web servers, database software, and tools for data analysis predominate, the focus of computer architects has shifted from Instruction Level Parallelism (ILP) to Memory Level Parallelism (MLP). The term “Memory Wall” was coined to describe the disparity between the rate of core performance improvement and the relatively stagnant rate of off-chip memory bandwidth increase. Additional cores, when integrated on the same die, and supplemental applications serve to widen this gap, since each individual core may generate substantial memory requests that need to be queued and served by the memory subsystem. Obviously, the capability of the off-chip memory system largely determines the per-core or even the overall performance of the entire system. In scenarios where the off-chip memory is insufficiently fast to handle all memory transactions in a timely manner, the system performance is highly likely to be bottlenecked by the slow memory accesses.
Several studies have proposed to physically alter the main memory in a Dynamic Random Access Memory (DRAM)-based memory system to improve performance and energy efficiency. One study proposed setting the bus frequency higher than the DRAM module to improve channel bandwidth where the induced bandwidth mismatch is resolved by a synchronization buffer inside the Dual In-line Memory Module (DIMM) for data and command. Other studies have explored using low power double data rate (LPDDR2) memory in place of conventional DDR3, due to its higher energy efficiency.
To reduce the delay of bank access, and thereby increase memory bandwidth, architects have optimized the memory system at the rank and bank level. One study subdivided conventional ranks into mini-ranks with a shorter data width. These mini-ranks can be operated individually via a small chip on each DIMM for higher DRAM energy efficiency. Rank sub-setting is also proposed to improve the reliability and performance of a memory system.
Inside a DRAM bank, increasing the row buffer hit ratio may also improve energy efficiency and performance. One study partitioned a row buffer into multiple sub-arrays inside a bank to reduce the row buffer miss rate. An asymmetric DRAM bank organization can be used to reduce the bank access latency and improve the system performance.
Some studies have already stressed the significance of off-chip bandwidth. To increase the overall energy efficiency of a memory system, one study split a 64 bit data bus into eight 8 bit data buses reducing the queue delay at the expense of data transfer delay. Another study designed a memory scheduler using principles of reinforcement learning to understand program behaviors and boost performance. Yet another study focused on boosting multi-threaded performance by providing fair DRAM access for each thread in their memory scheduler.
Architects have employed several sophisticated methods to balance core and memory performance; however, few of them have been able to increase the off-chip bandwidth beyond the constraint of static pin allocation.
Therefore, a long-standing, but unmet, need exists for apparatuses and methods for increasing off-chip bandwidth beyond static pin allocation, to mitigate the shortage of off-chip bandwidth during the memory-intensive phases of program executions, in order to improve performance of processors, including multi-core processors, during memory-intensive tasks. The present invention satisfies this need.