Graphics processing units (GPUs) typically access arrays of pixels using (X,Y), (X,Y,Z) or (X,Y,sample#) coordinates. Memory addresses are computed by interleaving or “swizzling” or rearranging low order bits of the coordinate indices, as specified by a resource descriptor. Each cacheline stores a two-dimensional (2D) (or in some cases 3D) region within the pixel array, which greatly reduces the bandwidth needed for graphics rendering operations.
Explicitly managing the working set is vital for graphics applications, since faulting in pages would cause unacceptable screen glitches. GPUs now support a technique called “tiled resources” (in DirectX) or “sparse textures” (in OpenGL). This provides a new user-space page mapping table, so that applications can select 64 KB “tiles” of data to map into process virtual address space (or into dedicated graphics memory). Accesses are translated through this user-mode table before any operating system (OS)-managed page translation. Unmapped tiles read as zero and ignore writes, rather than faulting.
At present there is no means to fully support tiled resources when accessing a graphics resource from the central processing unit (CPU). The only available mechanism is to access individual mapped tiles after performing the tile address translation in application code.
At present there is also no practical way to allow applications to access swizzled data as if it were a linear array. The only mechanisms available to applications are to copy the data while reordering it or to write the application to explicitly interleave the array indices, which means that the application software must know what swizzle format is being used. A third method called “aperture registers” translates swizzled addresses in hardware but this method is very slow and only usable by the driver due to the limited number of swizzled surfaces that can be supported at once, e.g. six.