Memory capacities have been steadily increasing over the past few years, for example for dynamic random access memory (DRAM), and the trend is expected to continue. While the large capacity is used in bursts when applications are running, a large portion of the memory is unused for a large portion of time. The entire memory, however, is powered up all the time and is continuously refreshed. DRAM memories need to be refreshed periodically since they leak charge over time. If a large portion of memory is not being used, then the useless data in the unused portions of the memory is still being refreshed, which wastes both power and performance.
As integrated circuit fabrication technology improves, manufacturers are able to integrate additional functionality onto a single silicon substrate. As the number of the functions increases, so does the number of components on a single Integrated Circuit (IC) chip. Additional components add additional signal switching, in turn, generating more heat and/or consuming more power. The additional heat may damage components on the chip by, for example, thermal expansion. Also, the additional power consumption may limit usage locations and/or usage models for such devices, e.g., especially for devices that rely on battery power to function. Hence, efficient power management can have a direct impact on efficiency, longevity, as well as usage models for electronic devices.
Moreover, current parallel graphics data processing includes systems and methods developed to perform specific operations on graphics data such as, for example, linear interpolation, tessellation, rasterization, texture mapping, depth testing, etc. Traditionally, graphics processors used fixed function computational units to process graphics data; however, more recently, portions of graphics processors have been made programmable, enabling such processors to support a wider variety of operations for processing vertex and fragment data.
To further increase performance, graphics processors typically implement processing techniques such as pipelining that attempt to process, in parallel, as much graphics data as possible throughout the different parts of the graphics pipeline. Parallel graphics processors with single instruction, multiple thread (SIMT) architectures are designed to maximize the amount of parallel processing in the graphics pipeline. In an SIMT architecture, groups of parallel threads attempt to execute program instructions synchronously together as often as possible to increase processing efficiency. A general overview of software and hardware for SIMT architectures can be found in Shane Cook. CUDA Programming, Chapter 3, pages 37-51 (2013) and/or Nicholas Wilt, CUDA Handbook, A Comprehensive Guide to GPU Programming, Sections 2.6.2 to 3.1.2 (June 2013).