Graphics rendering, particularly for three dimensional (3D) graphics applications, is one of the most processing intensive activities performed by personal computers. Graphics co-processors are available on most modern day personal computers. FIG. 1A is a system 100 employing a graphics processor 101 and a central processor 102, each coupled to a system memory 103 (e.g., DRAM, eDRAM, etc.) by a bus. Central processor 102 and graphics processor 101 may be disposed on a single piece of silicon (i.e., a single-chip solution), or integrated at a package, board, or system level. Graphics processor 101 includes a plurality of parallel processing sub-systems, or slices 105. Each slice 105 may be replicated any number of times for greater parallel graphics processing power. Within slice 105, there are a number of execution units (EU) 110, also known as “shader cores,” or simply “cores.” Each EU 110 contains scalar integer and floating-point arithmetic units that execute instructions. Each EU 110 has an instruction set architecture (ISA), may support context switching and pre-emptive multi-tasking, and may be essentially a complete x86 core, for example. Along with EUs 110, slice 105 includes a level two (L2) cache 130 (e.g., SRAM, eDRAM, etc.) and texture sampler 120. Texture sampler 120 includes fixed function logic (e.g., state machines). Texture sampler 120 may communicate with EU 110 via cache 130. Cache 130 may function as a texture cache that is a read-only memory to texture sampler 120 holding large arrays of predetermined texture data for use in texture mapping when a graphic is rendered for display by a platform hosting system 100.
The transformation of scene information (source data) into displayable images requires a number of functionalities, referred to in aggregate as a 3D graphics rendering pipeline. FIG. 1B is flow diagram depicting certain operations particular to a texture mapping portion of the graphics rendering pipeline. Texture mapping 101 generally entails imaging a textured signal onto a primitive's geometry, for example giving the appearance of pixel-level detail on more coarsely rendered polygon meshes that are manipulated on a vertex basis. At operation 105, texture coordinates are assigned to vertices of a given polygon. Generally, a texture is a digital image comprising an array of texels (texture elements), which may be individually addressed based on location within a two-dimensional (u,v) coordinate space, or in a three-dimensional (u,v,s) coordinate space. In the (u,v) coordinate space, u is the width and v is the height, and may be mapped between 0 and 1 based on the texture width and height. At operation 107 the texture coordinates are interpolated at each pixel within the polygon. At operation 111, a texture color at each pixel is fetched into cache based on the interpolated texture coordinate. At operation 113, the texture is sampled and filtered to arrive at a particular texel color at each pixel. Often, there is a disparity between a number of sample texture elements (texels) and the source texture image and the number of picture elements (pixels) to which the image is mapped. If a texture is too large or too small for a given polygon, the texture is filtered to fit the space. A magnification filter enlarges (zooms-in) a texture, a minification filter reduces (zooms-out) the texture to fit into a smaller area. Texture magnification maps few texels to many pixels by repeating the sampled texel for a plurality of addresses, for example providing a blurrier image. Texture minification maps many texels to few pixels by combining more than one texel value into a single value. This can cause aliasing or jagged edges, and antialiasing techniques become important to reduce visual artifacts. The goal of texture filtering then is to compute the average value of the image over an area around each pixel, for example through averaging of many texels associated with a given pixel.
Texture filtering has largely been performed by fixed-function logic found in texture sampler 120. Such texture samplers have a fixed filter footprint (shape) associated with a type of texture filtering, such as point sampling, bi-linear filtering, tri-linear filtering, and anisotropic filtering. As the filtering methods become increasingly complex, and as uses for texture data continues to expand, for example, being used for lighting and other surface properties in addition to color, a sampler with a fixed-function filter has become inefficient and/or insufficient. As such, shader programs instantiated by EU 110 have taken larger roles in texture mapping, for example resulting in the architecture of system 106 illustrated in FIG. 1C. In system 106, EU 110 implements a filter footprint 140 in an application layer, and a plurality of texture requests in (u,v) space associated with footprint 140 are sent to the texture sampler 120. Texture sampler 120 then fetches texture data 115 into the cache for each of the (u,v) addresses associated with filter footprint 140. EU 110 executing instructions defined in the application layer then accumulates texture data 115. As such, texture data for an entire footprint is passed through the texture sampler with no data reduction because filtering is off-loaded from fixed-function logic 131 onto EU 110. In this configuration however, sampler chip area occupied by fixed-function logic 131 is wasted. Another issue with this architecture is reduced cache usage efficiency. With multiple sampler messages sent from a kernel there's a higher probability that the interleaved messages from multiple EUs will cause many cache evictions. Furthermore, texture mapping bandwidth is potentially constrained with the greater amount of data communicated between EU 110 and texture sampler 120 since texture data is not significantly processed and/or reduced by texture sampler 120. Another disadvantage of the system architecture depicted in FIG. 1C is that filtering performed by EU 110 may require more power and may be slower than if implemented with optimized, purpose-built logic within texture sampler 120. Therefore, the programmability afforded with shader-based filtering may be particularly disadvantageous for mobile devices executing graphics-intensive applications where the greater power demand translates into reduced battery life.