1. Field of the Invention
The present invention generally relates to graphics processing and, more specifically, to techniques for interleaving surfaces.
2. Description of the Related Art
A graphics processing unit (GPU) is often configured to perform specific tasks included in the graphics pipeline (the collection of processing steps performed to transform 3-D images into rendered 2-D images). GPUs typically represent the surface appearance of objects using 2-D surfaces, such as textures. In general, textures include a variety of data, such as color or transparency, which may vary across the surface of the object. In particular, each texture includes an array of texels representing the value of characteristics at particular points on the surface of the object. To realistically portray a complex graphics scene in fine detail often requires many large textures and a correspondingly large amount of memory. Because the amount of physical memory local to the GPU (local memory) is limited, rendering such complex graphics scenes often exceeds the capacity of the local memory. Accessing physical memory that is not local to the GPU, such as system memory, introduces relatively large latencies and reduces the throughput of the graphics pipeline.
Local memory limitations may be further exacerbated by how memory is typically allocated. Memory is typically allocated in discrete units, such as a page, of a fixed size (e.g., 32 KB, 64 KB, etc.) that are contiguous within the physical memory space. Typically, to store the data associated with a texture in physical memory, the texture is divided into tiles of data, where the size of the tile matches the size of a page. Some textures are “sparse,” meaning that the textures include many areas that do not include any data (e.g., areas where there are no visible objects). And many sparse textures include partially filled tiles (e.g., only the upper half of the tile includes useful data). Since a page of memory is allocated for each tile that includes useful information, each partially filled tile wastes physical memory. For example, suppose that the size of a page were 64 KB. Further, suppose that a texture were to include 2 tiles, with each tile including 1 KB of data. In such a scenario, the GPU would allocate two pages of local memory (a total of 128 KB) to represent the 2 KB of data, thereby wasting 126 KB of physical memory.
One approach to addressing the above problems is to decrease the size of the pages when allocating physical memory. Decreasing the size of the pages increases the likelihood that a sparse texture includes empty pages that do not contribute to unnecessary memory waste. For instance, suppose that the size of a page were 64 KB and a texture “TA” were to include two 64 KB tiles, where only the top half of each tile included useful data. The GPU would allocate 128 KB to represent the texture “TA.” In contrast, suppose that the page size were reduced to 32 KB. The texture “TA” would then include two 32 KB pages that included useful data and two 32 KB empty pages. Consequently, the GPU would allocate only 64 KB to represent the texture “TA.”
One limitation to reducing the page size is that smaller page sizes reduce overall memory efficiency. More specifically, GPUs employ virtual memory to enable processes to address more memory than is supported by the available local physical memory. The virtual memory is typically allocated in pages of the same size as the physical memory. A TLB (translation lookaside buffer) is used to speed up the translation between virtual pages and physical pages. However, a TLB is limited in size and, thus, only represents a subset of the virtual memory pages. When a virtual memory page that is not included in the TLB is accessed, a “TLB miss” occurs and the memory efficiency is degraded. As the page allocation size decreases, the number of memory addresses that the TLB spans also decreases and, consequently, the likelihood of TLB misses increases. The resulting decrease in memory efficiency due to TLB misses may very well exceed the increase in memory efficiency attributable to reducing the page size when working with sparse textures.
In another approach to allocating physical memory, pages are shared between certain textures. More specifically, one or more textures that share the same sparse allocation pattern, the same size, and the same shape are assigned to share a particular page in both virtual and physical memory. For example, suppose that both a texture “A” and a texture “B” were to include useful data in the same set of texels and in no other texels. The GPU would be configured to divide the textures into tiles corresponding to half the page size. The GPU would then map the texture “A” and the texture “B” to the same virtual address space, mapping the data included in texture “A” to the top half of each page and the data included in texture “B” to the bottom half of each page. Reducing the tile size and, subsequently, packing the tiles into pages in virtual memory increases the number of empty pages and reduces the amount of allocated physical memory. However, this approach is limited to textures that share the same sparse allocation pattern, the same size, and the same shape. Consequently, rendering many complex scenes is still adversely impacted by allocating physical memory to sparse textures with differing allocation patterns, sizes, and shapes.
As the foregoing illustrates, what is needed in the art is a more efficient and flexible technique to allocate physical memory, especially when working with sparse textures.