U.S. Pat. No. 5,751,292 granted to Emmot describes a texture for use in displaying surface detail of an object modeled in a computer (column 1, lines 12-15). The computer uses a number of texels (column 7, line 54) that are point elements of a two-dimensional image (referred to as a xe2x80x9ctexturexe2x80x9d, e.g. surface detail of leather) and that are mapped onto a surface of a three-dimensional object (column 1, lines 44-53), e.g. a seat (thereby to form the image of a leather seat). Each texel in a texture is normally defined by S and T coordinates (sometimes called xe2x80x9cU and V coordinatesxe2x80x9d) of the texel. The S and T coordinates identify the location of the center of a texel relative to the two-dimensional texture (column 1, lines 59-60). For example, texel 12 in FIG. 1A has the coordinates S12 and T12.
To eliminate aliasing, texels can be xe2x80x9cfilteredxe2x80x9d (low pass) to obtain a value at the location of a to-be-displayed pixel by use of adjacent texels to generate the filtered texel. For example, Emmot states that xe2x80x9cfor each display screen pixel that is rendered with texture data from a two-dimensional texture map, as many as four texels . . . or eight texels . . . may be accessed from the cache memory to determine the resultant texture data for the pixelxe2x80x9d (column 14, lines 22-27).
The above-described filtering of texels can be of three types. As stated by Emmot, xe2x80x9c[w]hen a point sampling interpolation mode is established, the resultant texel data equals the single texel that is closest to the location defined by the pixel""s S, T coordinates in the texture map. Alternatively, when bilinear or trilinear interpolation is employed, the resultant texel data is respectively a weighted average of the four or eight closest texels . . . The weight given to each of the multiple texels is determined based upon the value of the gradient and [fractional] components of the S and T coordinates provided to the texel interpolator . . . xe2x80x9d (column 14, lines 32-41).
Specifically, the intensity I for a point 9 (FIG. 1A) is obtained by bilinear interpolation of four texels 10-13 (also called a xe2x80x9cquadrupletxe2x80x9d and abbreviated as xe2x80x9cquadxe2x80x9d) that are adjacent to each other. If the four texels 10-13 have intensities I0-I3, intensity I is given by I=Ct((Cs(I1xe2x88x92I0)+I0)xe2x88x92(Cs(I3xe2x88x92I2)+I2))+(Cs(I3xe2x88x9212)+I2), where Cs and Ct are the distances of point 9 from the (S,T) coordinates of texel 12. See U.S. Pat. No. 5,706,481 (incorporated by reference herein in its entirety) at column 8, lines 50-59. In bilinear filtering, the four texels 10-13 are from a texture at a single magnification (called xe2x80x9clevel of detailxe2x80x9d and abbreviated as xe2x80x9cLODxe2x80x9d).
Trilinear filtering uses a first filtered texel obtained by bilinear interpolation of a first quad at a level of detail L (having an integer value, e.g. 2), and a second filtered texel obtained by bilinear interpolation of a second quad at a level of detail L+1 as follows. An interpolation is performed between the first and second filtered texels to obtain a filtered texel at a third LOD (having a real value, e.g. value 2.5) that is between L and L+1. Therefore, trilinear filtering normally requires that a cache address generator 6 (FIG. 1B; see U.S. Pat. No. 5,327,509) generate the addresses of four texels at level of detail L and four texels at level of detail L+1. Cache address generator 6 supplies the eight addresses to a texture pattern memory 7 (FIG. 1B) that hold texels belonging to each of L and L+1 levels of detail. A texture trilinear interpolator 8 uses the eight texels to perform the interpolation.
A circuit and process in accordance with the invention perform trilinear filtering using a number (e.g. 4) of texels (called xe2x80x9cnearest texelsxe2x80x9d) that are nearest to a to-be-displayed pixel, and also use an additional number (e.g. 12) of texels (called xe2x80x9csurrounding texelsxe2x80x9d) that surround the nearest texels. The nearest texels and the surrounding texels are all from only one level of detail L, while a filtered texel generated by the circuit and process is at a level of detail between L and L+1. The filtered texel is used in rendering the to-be-displayed pixel, and can be made identical to a texel obtained by trilinear filtering in the prior art.
In a first embodiment, the circuit and process use the nearest texels and the surrounding texels (all of which are at a level of detail L) to generate a first quad of texels at a coarse level of detail L+1. Thereafter, the generated quad (at the coarse level of detail L+1) is used with a second quad of the nearest texels (at the level of detail L) to perform trilinear filtering. In the first embodiment, generation of the first quad is performed by a coarse texel generator, and interpolation between two levels of detail L and L+1 is performed by an interpolation circuit that are both included in the circuit (also called xe2x80x9csingle level trilinear circuitxe2x80x9d) of the first embodiment.
Specifically, the coarse texel generator has input terminals (hereinafter xe2x80x9cfine texel terminalsxe2x80x9d) coupled to two buses: the nearest texel bus and to the surrounding texel bus to receive therefrom a total of sixteen texels at the level of detail L. The coarse texel generator also has an output bus (hereinafter xe2x80x9ccoarse texel busxe2x80x9d) to carry away the quad of coarse texels generated therein. The nearest texels (received from the nearest texel bus) and the surrounding texels (received from the surrounding texel bus) form four quads, wherein all four quads are adjacent to each other and are from the level of detail L, and each quad touches at least two other quads (in a manner similar to the four quadrants of a square). The coarse texel generator includes arithmetic units that average texels in the four quads (individually for each quad) to form four coarse texels that are supplied to the coarse texel bus.
The interpolation circuit has several groups of input terminals. A first group of input terminals (hereinafter xe2x80x9ccoarse quad terminalsxe2x80x9d) are coupled to the coarse texel bus to receive the quad of coarse texels. A second group of input terminals (hereinafter xe2x80x9cfine quad terminalsxe2x80x9d) are coupled to the nearest texel bus to receive a quad of nearest texels. A third group of input terminals (hereinafter xe2x80x9ccoordinate terminalsxe2x80x9d) are coupled to the coordinate input bus to receive therefrom fractional parts of the S and T coordinates (also called xe2x80x9cS and T coordinate fractionsxe2x80x9d) for the filtered texel. A fourth group of input terminals (hereinafter xe2x80x9cLOD terminalsxe2x80x9d) are coupled to the level of detail bus. The interpolation circuit also has output terminals (hereinafter xe2x80x9cfiltered texel output terminalsxe2x80x9d) that are coupled to the texel output bus to supply thereto the filtered texel obtained by interpolation. Specifically, the interpolation circuit performs trilinear interpolation between the four coarse texels from the coarse texel generator and four of the fine texels (one fine texel from each of the four quads) by use of the texel""s S and T coordinate fractions and the level of detail fraction to generate the filtered texel on the texel output bus.
In the first embodiment, the circuit and process generate texels at a coarse level of detail L+1 twice: a first time to create all texels at the coarse level of detail L+1 (for an initial set of mipmaps), and a second time to create a quad of coarse texels that are used for trilinear interpolation. Therefore, when generating the coarse texels for a second time, all texels at the coarse level of detail L+1 are not created. Instead, in this embodiment, only the specific quad of coarse texels that are required at the moment for trilinear interpolation are created.
The regeneration of coarse texels (i.e. generation of the coarse texels a second time) is performed in the coarse texel generator that is included in a texture system of a graphics processor, and the resulting coarse texels are used directly (without storage in main memory) by the interpolation circuit (also included in the texture subsystem) to perform trilinear filtering. In contrast, the first act of generating coarse texels (for the initial set of mipmaps) is performed elsewhere (e.g. in a central processing unit (CPU)), and thereafter the coarse texels are stored in memory. At some later time, the coarse texels are conventionally fetched into a texture cache and used with fine texels in trilinear interpolation (performed without regeneration). Alternatively, in the first embodiment, a quad of coarse texels is freshly generated (in the act called xe2x80x9cregeneratingxe2x80x9d), and eliminates use of previously-generated coarse texels (that remain in memory). Note that only the quad that is necessary for trilinear interpolation is generated by the coarse texel generator. Note further that the previously-generated coarse texels (at level of detail L+1) are used in the first embodiment only when regenerating even coarser texels (at level of detail L+2) for use in trilinear filtering (between levels L+1 and L+2).
Regeneration of coarse texels (i.e., generation of coarse texels a second time) as described herein requires a bus from texture cache to have additional width, e.g., to carry sixteen texels instead of the eight texels required (four at each level of detail) in conventional trilinear filtering, and further requires additional hardware, e.g., in the coarse texel generator. However, regeneration eliminates hardware that may otherwise be required in the prior art. For example, regeneration eliminates circuitry required in a cache address generator to simultaneously generate addresses of the coarse texels and of the fine texels. Regeneration also eliminates storage elements required in a texture cache to temporarily hold the coarse texels. Such regeneration may reduce memory bandwidth by reducing or eliminating the fetching of coarse texels into the texture cache that may be otherwise required in the prior art. Depending on the implementation, the savings in memory bandwidth, address generation hardware, and cache size can outweigh any extra circuitry required for regenerating the coarse texels.
In one variant of the first embodiment, a filter of the same order (e.g. a linear filter such as a box filter) is used in both generation and regeneration of coarse texels. In one specific implementation, the nearest texels and the surrounding texels form four quads (wherein each quad touches at least two other quads), and the four quads are each averaged individually (during regeneration) to form four coarse texels. In this implementation, the four coarse texels created by such averaging are identical to texels obtained during the first act of generating coarse texels if the exact same filter is used in both generation and regeneration of coarse texels.
However, in other variants, filters of different orders are used. For example, the first act of generation is done with a gaussian filter (because speed and the number of gates are not critical when mipmaps are being generated off-line) and the second act of generation is done with a box filter (because the resulting quad of coarse texels normally needs to be created within a graphics processor that functions within certain constraints (e.g. speed and gate count) imposed by real time display). Note that such use of different filters may result in a filtered texel that is slightly different from conventional trilinear filtering.
In a second embodiment, the circuit and process use the nearest texels and the surrounding texels at a fine level of detail L to directly generate a filtered texel, without generation of the quad of coarse texels at a coarse level of detail L+1 (as described above for the first embodiment). One implementation of the circuit (also called xe2x80x9csingle level trilinear circuitxe2x80x9d) includes a coefficient generator that uses the texel coordinate fractions and the level of detail fraction p to generate coefficients, and a multiply-add circuit that receives the coefficients from the coefficient generator and uses the coefficients to generate the filtered texel. The multiply-add circuit includes a number of adders that are coupled to the surrounding texel bus. Each adder receives three texels from the surrounding texel bus and supplies to a multiplier (included in the multiply-add circuit) a summed texel obtained by adding the three texels. All such multipliers in the multiply-add circuit are coupled to the plurality of adders to receive therefrom the summed texels, and to the nearest texel bus to receive therefrom the nearest texels. The multiply-add circuit performs a sum of products to generate the filtered texel. Specifically, the multipliers multiply the summed texels and the nearest texels with the respective coefficients, and an adder coupled to the multipliers adds the products thereby to generate the filtered texel.
The second embodiment has advantages (over the prior art use of previously-generated coarse texels) that are similar or identical to the advantages of the first embodiment described herein. Moreover, under certain conditions, the second embodiment requires fewer gates for implementation than the first embodiment. Also, the second embodiment has lower latency because the arithmetic operations are performed in parallel as compared to serialized performance of such operations in one implementation of the first embodiment.
The single level trilinear circuit described above can be used either directly, or depending on a mode that indicates trilinear filtering to be performed using texels of a single level. The mode can be set by a software driver process (executed in a CPU) that regenerates a coarse texel from a number of fine texels by a method identical to the method (e.g. box filter) used by hardware in the graphics processor, and compares the regenerated texel with another coarse texel that is pre-existing in a mipmap at the level of detail L+1. In case of a match, such coarse texels are regenerated by the single level trilinear circuit (e.g., in a single cycle). In case of no match, the single level trilinear circuit performs trilinear filtering using pre-existing coarse texels of the L+1 mipmap (e.g., in two cycles by inverting the fractional level of detail p in one of the two cycles). Trilinear filtering using pre-existing coarse texels may be necessary, e.g. if texels in the L+1 mipmap were generated by a circuit other than a box filter (such as a SINC filter, a gaussian filter, or a Bartlett filter). The process may be implemented in two different circuits, e.g. a central processing unit (CPU) that compares the regenerated texel with the pre-existing texel and sets the mode, and a graphics processor that is responsive to the mode.