1. Field of the Invention
The present invention generally relates to computer graphics and more particularly to a method and system for processing texture samples with programmable filter weights.
2. Description of the Related Art
Unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Conventional graphics systems use texture mapping to add realism to a computer-generated scene. During a texture mapping operation, a texture lookup is generally performed to determine where on a texture map each pixel center falls. One pixel usually does not correspond only to one texture map element, also referred to as a texture sample or a texel. Thus, to calculate the optimal color for the pixel, some form of filtering involving multiple texels is performed.
FIG. 1A illustrates a portion of a graphics processing unit (GPU) conventionally involved in a texture filtering operation. This portion of the GPU includes a pixel shader 102, a texture unit 104, and memory 106. Pixel shader engine 102 executes a shader program that issues a texture mapping instruction to texture unit 104. In response to the instruction, texture unit 104 fetches the necessary texels from memory 106 and performs the necessary filtering operation using the fetched texels.
One technique commonly used in this texture filtering operation is bilinear interpolation, which interpolates among four texels to generate the final color value for a pixel. To illustrate, in FIG. 1B, px represents a texture coordinate on the texture map 122. Suppose px is surrounded by four nearby texels p0, p1, p2, p3 with the colors C0, C1, C2, and C3, respectively, one can calculate the texel color at px by performing a bilinear interpolation as follows: (1) calculating the filter weights w0, w1, w2, w3 for the four surrounding texels based on their distance to px, (2) applying the filter weights to the colors of the texels, and (3) summing up the weighted average colors. Here the interpolated color at px is referred to as Cx.
A prior art approach where the aforementioned steps are performed using the hardware shown in FIG. 1A has certain limitations. This “first approach” involves issuing a single TEX shader program instruction from pixel shader engine 102 to texture unit 104 to trigger the bilinear interpolation. However, in this approach, texture unit 104 calculates all the filter weights internally based on the positions of the four texels in the texture map relative to the pixel and does not afford the user any opportunity to specify the filter weights. For example, suppose the instruction issued by the shader program running on pixel shader engine 102 is TEX R0, px, texture[122], where R0 is the placeholder for the computed color value at texture coordinate px on texture map 122 as shown in FIG. 1B. In response to this TEX instruction, texture unit 104 issues four separate read requests to memory 106 to fetch the texel colors Ci for each of the four texels used in the bilinear interpolation (i.e., C0, C1, C2, and C3). After having received the requested texel colors Ci, texture unit 104 computes the color value R0 by performing the steps (1)-(3) described above. Here, texture unit 104 calculates the filter weights based on fixed formulae using the distances between the location of px in texture map 122 and the location of each of the four texels p0, p1, p2 p3. In other words, this first approach relies solely on hardware-generated filter weights to carry out the bilinear interpolation and provides neither the flexibility nor the image quality associated with filtering schemes that implement programmable filter weights.
Although the first approach may be relatively simple to implement, it can produce poor results in certain graphics applications. For example, in real-time applications that magnify a texture, the first approach may yield exceedingly blurry images. To alleviate this problem, Pradeep Sen in his article, “Silhouette Maps for Improved Texture Magnification,” discusses a filtering method where discontinuity information in a texture map (the “second approach”) is specified. FIG. 1C illustrates a scenario in which the benefits of the second approach over the first approach can be demonstrated. In the first approach, even though the screen pixel R1 resides in the region of a texture map 124 that is entirely red, the colors of the four texels, C0, C1, C2, and C3, would still contribute to the final texture value for px. This resulting texture value therefore would not be exactly red, and this imprecise color would be especially noticeable under magnification. In the second approach, on the other hand, boundary edge 126 delineating a color discontinuity separating between red on the right side of the edge and blue on left side of the edge can be specified. Boundary edge 126 breaks up texture map 124 into different regions. The samples located on the same side of the boundary are grouped together in a filtering operation. So, because px resides on the same red side as pi and p3, only C1 and C3 are fetched and filtered to compute the texture value at px. The resulting texture value, unlike the first approach, would contain the precise red color in this example. It is worth noting that by specifying discontinuity, such as a boundary edge, the filter weights are also specified. For instance, by specifying boundary edge 126, the filter weights for C0 and C2 would be programmed to zero, because they do not contribute at all to the calculation of the texture value for R1.
Even though the second approach supports a programmable and a more intelligent filtering method than the first approach, the second approach implemented using the hardware shown in FIG. 1A still has some shortcomings. In particular, texture unit no longer computes the final color value Cx, but rather transmits the color values of the four texels, C0, C1, C2, and C3, to pixel shader engine 102 for processing. This distribution of processing may lead to inefficient use of memory 106. To illustrate, implementing the second approach using the hardware of FIG. 1A and operating on texture map 122 shown in FIG. 1B would require the following instructions:
# initialize Cx′ to 0
(1) TEX C0, p1, texture[122]
(2) TEX C1, p2, texture[122]
(3) TEX C2, p3, texture[122]
(4) TEX C3, p4, texture[122]
(5) MAD Cx′, C0, w0′, Cx′
(6) MAD Cx′, C1, w1′, Cx′
(7) MAD Cx′, C2, w2′, Cx′
(8) MAD Cx′, C3, w3′, Cx′
The shader program issues the first four TEX shader program instructions to texture unit 104 to essentially retrieve the four texel colors, C0, C1, C2, and C3. Then the shader program issues the next four MAD instructions to pixel shader engine 102 with the used-specified filter weights w0′, w1′, w2′, and w3′ to compute the final output color stored in Cx′. So, even though the filter weights would be programmable via the MAD instructions, performing bilinear interpolation with these user-specified filter weights would require eight instructions. The first four instructions are executed by texture unit 104, and the second four instructions are executed by pixel shader engine 102. Moreover, because of the multi-threaded nature of pixel shader engine 102, even though the texture cache may have, in anticipation of cache access locality, prefetched C1, C2, and C3 in the cache after instruction (1) is executed, these values very likely would have been flushed out of the cache by other intervening threads before instruction (2) is executed. With cache misses, memory 106 would need to be more frequently accessed, adding even more clock cycles to the already high number of clock cycles that would be needed to execute the eight instructions, resulting in performance inefficiencies and increased power consumption for the GPU.
As the foregoing illustrates, what is needed in the art is a more efficient technique for processing texture samples with programmable filter weights.