1. Field of the Invention
Embodiments of the present invention generally relate to DRAM (dynamic random access memory) controller systems and, more specifically, to systems for efficient retrieval from tiled memory surface to linear memory display.
2. Description of the Related Art
Modern graphics processor units (GPUs) commonly arrange data in memory to have two-dimensional (2D) locality. More specifically, a linear sequence of 256 bytes in memory, referred to herein as a “group of blocks” (GOB), may represent four rows and sixteen columns in a 2D surface residing in memory. As is known in the art, organizing memory as a 2D surface improves access efficiency for graphics processing operations that exhibit 2D locality. For example, the rasterization unit within a GPU tends to access pixels within a moving, but localized 2D region in order to rasterize a triangle within a rendered scene. By organizing memory to have 2D locality, pixels that are localized within a given 2D region are also localized in a linear span of memory, thereby allowing more efficient memory access.
While structuring memory to accommodate 2D locality benefits many of the graphics processing operations included in the GPU, certain other types of access patterns generated within the GPU are oftentimes made less efficient. The display controller within the GPU, for example, typically accesses only one row of data from memory at a time. Each such row normally spans multiple GOBS in the horizontal dimension. However, the memory controller within the GPU typically reads two or more rows of data from memory at a time when a GOB is accessed. Thus, when the display controller requests data from the memory controller for one specific row of data, the memory controller actually reads two or more rows of data to fulfill the read request. As a result, the data path between the memory controller and the display controller must be sized to accommodate the additional bandwidth associated with the extra data read from memory by the memory controller even though this extra data is discarded by the display controller and not used. Die area is consequently wasted since the data channel ends up carrying unused data.
One potential solution to this problem includes adding a data buffer to the display controller so that the otherwise discarded data is instead buffered in the display controller for use in a subsequent display line. While this solution may improve overall memory use since each row of data is read from memory only once and no data is discarded, the data path between the memory controller and the display controller must still be large enough to carry the multiple rows of data read from memory by the memory controller. Thus, this solution adds the expense of an on-chip data buffer without decreasing the expense of the data path between the memory controller and the display controller.
As the foregoing illustrates, what is needed in the art is a way to optimize the size of the on-chip data path between the memory controller and the display controller within a GPU.