Graphics Processing Units (GPUs) and other throughput processing architectures have scaled performance through simultaneous improvements in compute capability and aggregate memory bandwidth. To continue on this trajectory, future systems will soon require more than 1 TB/s of bandwidth. Satisfying this increasing bandwidth demand, without a significant increase in the power budget for the DRAM, is a key challenge. The off-package high-speed signaling across a printed circuit board (PCB) used by traditional bandwidth-optimized graphics double data rate type five (GDDR5) memories can consume a significant portion of the system energy budget, becoming prohibitive as bandwidths scale beyond 1 TB/s.
On-package stacked memories, such as High Bandwidth Memory (HBM), allow the processor and memory to communicate via short links within a package, thereby reducing the cost of data transfer on the interface between the DRAM stack and the processor die. While this improved signaling reduces the I/O energy (the energy on the link between the DRAM and processor dies), the energy required to move data from the DRAM bit cells to the input/output (I/O) interface of the DRAM device/stack remains unchanged. Consequently, as we project the bandwidth demands of future GPUs, further energy reductions within the DRAM itself are required to enable future high-bandwidth systems. Thus, there is a need for addressing these issues and/or other issues associated with the prior art.