1. Field of the Invention
Embodiments of the present invention generally relate to mapping DRAM (dynamic random access memory) partitions to virtual memory pages and, more specifically, to using a non-power-of-two virtual memory page size.
2. Description of the Related Art
Conventional multithreaded processing systems for graphics processing use off-chip memory to store image data and texture map data. A frame buffer memory constructed of DRAM devices is used to store the image data and texture map data. When multiple DRAM devices are used it is desirable to store portions of the image data or texture map data in each of the DRAM devices in order to access the portions in parallel. Therefore, more data may be read or written in a single clock cycle to optimize the available memory bandwidth between the frame buffer memory and the graphics processor.
FIG. 1 illustrates a prior art block diagram of DRAM memory 100. The basic unit of DRAM organization is a bank, such as banks 120-0, 120-1, 120-2, and 120-3 of DRAM memory 100. Each bank 120 has a single row buffer to store a row of data to be written to or that was read from the bank. Row address 115 is used to select a single row from each bank to load into each row buffer 110 for a read operation or to load from each row buffer 110 for a write operation. Column address 116 selects a portion of a row stored in each row buffer 110. A conventional DRAM memory 100 is configured with 2 Kbytes per row and column address 116 selects 16 bytes of the 2 Kbyte row for output to or input from data 105.
Moving data from a bank 120 to a row buffer 110 requires an activate operation. Similarly, moving data from a row buffer 110 to a bank 120 requires a precharge operation. There is a minimum delay for switching between activate and precharge operations that is typically several clock cycles. Therefore, it is desirable to complete all read and/or write operations for a particular row before switching to a different row.
Furthermore, because the data bus for transporting data 105 is shared it is desirable to read or write row buffers 110 in contiguous clock cycles. Interleaving read and write operations for each bank will lower performance due to the overhead required to change the direction of the shared bus to transition from reads to writes or from writes to reads. Spreading a sequence of read operations or write operations across banks 120 permits hiding of activate or precharge operations. For example, during a read sequence, data loaded into row buffer 110 corresponding to bank 120-0 may be output as data 105 while bank 120-1 is precharged for a subsequent read in the sequence. When different rows of the same bank are accessed the precharge and activate delays cannot be hidden.
Accesses may be spread across banks 120 by interleaving data for a graphics surface, such as a texture map or image. As a scanline is traversed during rendering, banks 120 are accessed in sequence. Similarly, as a texture map is read portions of data may be read from each bank 120. It is desirable to modify the granularity of the interleaving depending on surface characteristics of each graphics surface, such as the number of bits per pixel or texel. It is also desirable to modify the interleaving pattern to spread accesses across multiple banks rather than concentrating accesses within a single bank, resulting in “bank conflicts.” Accordingly, the memory addressing mechanism used to read and write a local memory embodied using DRAM memory to store graphics surfaces should allow different interleave patterns to spread accesses across multiple banks to avoid bank conflicts.