1. Field of the Invention
This invention relates generally to the field of computer graphics and, more particularly, to high performance graphics systems.
2. Description of the Related Art
A host computer may rely on a graphics system for producing visual output on a display device. The graphics system may receive graphics data (e.g. triangle data) from the host computer, and may generate a stream of output pixels in response to the graphics data. The stream of output pixels may be stored in a frame buffer, and then dumped from the frame buffer to a display device such as a monitor or projection screen.
To obtain images that are more realistic, some prior art graphics systems have gone further by generating more than one sample per pixel. As used herein, the term xe2x80x9csamplexe2x80x9d refers to calculated color information that indicates the color, depth (z), and potentially other information, of a particular point on an object or image. For example, a sample may comprise the following component values: a red value, a green value, a blue value, a z value, and an alpha value (e.g., representing the transparency of the sample). A sample may also comprise other information, e.g, a blur value, an intensity value, brighter-than-bright information, and an indicator that the sample consists partially or completely of control information rather than color information (i.e., xe2x80x9csample control informationxe2x80x9d). By calculating more samples than pixels (i.e., super-sampling), a more detailed image is calculated than can be displayed on the display device. For example, a graphics system may calculate four samples for each pixel to be output to the display device. After the samples are calculated, they are then combined or filtered to form the pixels that are stored in the frame buffer and then conveyed to the display device. Using pixels formed in this manner may create a more realistic final image.
These prior art super-sampling systems typically generate a number of samples that are far greater than the number of pixel locations on the display. These prior art systems typically have rendering processors that calculate the samples and store them into a render buffer. Filtering hardware then reads the samples from the render buffer, filters the samples to create pixels, and then stores the pixels in a traditional frame buffer. The traditional frame buffer is typically double-buffered, with one side being used for refreshing the display device while the other side is updated by the filtering hardware. Once the samples have been filtered, the resulting pixels are stored in a traditional frame buffer that is used to refresh to display device. These systems, however, have generally suffered from limitations imposed by the conventional frame buffer and by the added latency caused by the render buffer and filtering. Therefore, an improved graphics system is desired which includes the benefits of pixel super-sampling while avoiding the drawbacks of the conventional frame buffer.
However, one potential obstacle to an improved graphics system is that the filtering operation may be computationally intensive. A high-resolution graphics card and display may need to support millions of pixels per frame, and each pixel may be generated by filtration of a number of samples. This typically translates into a large number of calculations. In particular, each pixel component such as red, green and blue may be generated by constructing a weighted sum of the corresponding sample components. However, it is important to guarantee that the filter weights used to generate the weighted sums do not introduce color gain or attenuation. In other words, if the filter weights are not appropriately chosen, a group of samples all having identical red intensity Xr may have a weighted sum equal to kXr where k is not equal to one. This implies that the resulting red pixel value will be more or less intense than desired. Thus, there is a substantial need for a system and method which could provide for unity gain in the filtering process (i.e. in the process of generating pixel values from sample values) in a manner which is flexible and efficient.
Furthermore, because each pixel comprises a number of components such as red, green, and blue, the filtering process may require multiple summations to be performed per pixel. Thus, there exists a need for a system and method which may efficiently and flexibly perform summations.
The present invention comprises a computer graphics system configured to receive 3D graphics data, generate samples in response to the 3D graphics data, filter the samples to generate output pixels, and provide the output pixels to a display device such as monitor or projector. In some embodiments, the graphics system comprises a sample buffer, a graphics processor configured to render (or draw) the samples into the sample buffer, and a sample-to-pixel calculation unit. The sample-to-pixel calculation unit may be responsible for filtering samples to generate pixel values.
The graphics processor may perform various rendering operations on the received 3D graphics data (e.g. triangle data) to generate samples based on a selected set of sample positions in a 2D screen space. Each sample may comprise a set of values such as red, green and blue. The samples are stored into the sample buffer for subsequent access by the sample-to-pixel calculation unit. The graphics processor preferably generates and stores more than one sample per unit pixel area in the 2D screen space for at least a subset of the output pixels. Thus, the sample buffer may be referred to as a super-sampled (or xe2x80x9cover-sampledxe2x80x9d) sample buffer. In other embodiments, the graphics processor may generate one sample per unit pixel area, or, less than one sample per unit pixel area (e.g. one sample for every two pixels). In one embodiment, the samples may be more densely positioned in certain areas of the screen space and less densely positioned in other areas.
The sample-to-pixel calculation unit is configured to read the samples from the sample buffer and filter (or convolve) the samples into respective output pixels. The output pixels are then provided to refresh the display. As used herein, the terms xe2x80x9cfilterxe2x80x9d and xe2x80x9cconvolvexe2x80x9d are used interchangeably, and refer to the process of generating a pixel value by computing a weighted average of a corresponding set of sample values. The sample-to-pixel calculation unit filters samples based on a filter function which may be centered over a current pixel location in the screen space. The filter function has an associated domain of definition referred to herein as the filter support or filter extent. The sample-to-pixel calculation unit:
(a) selects those samples which fall within the filter support in screen space,
(b) generates filter weights for each of the xe2x80x9cinteriorxe2x80x9d samples based on the filter function, and
(c) computes a weighted average of interior sample values for each pixel attribute (such as red, green, blue and alpha) using the filter weights.
The sample-to-pixel calculation unit may access samples from the sample buffer, perform a real-time filtering operation, and then provide the resulting output pixels to the display in real-time. The graphics system may operate without a conventional frame buffer. In other words, there may be no frame-oriented buffering of pixel data between the sample-to-pixel calculation units and the display. Note some displays may have internal frame buffers, but these are considered an integral part of the display device, not the graphics system. As used herein, the term xe2x80x9creal-timexe2x80x9d refers to an operation that is performed at or near the display device""s refresh rate. For example, filtering samples xe2x80x9con-the-flyxe2x80x9d means calculating output pixels at a rate high enough to support the refresh rate of a display device. The term xe2x80x9con-the-flyxe2x80x9d refers to a process or operation that generates images at a rate near or above the minimum rate required for displayed motion to appear smooth (i.e. motion fusion) and/or for the light intensity to appear continuous (i.e. flicker fusion). These concepts are further described in the book xe2x80x9cSpatial Visionxe2x80x9d by Russel L. De Valois and Karen K. De Valois, Oxford University Press, 1988.
The filter weight assigned to each sample depends on the filter function being used and on the distance of the sample from the pixel center or the filter center. It is noted that the terms filter weight and filter coefficient are used interchangeably herein. For each pixel attribute (e.g. the red attribute), the pixel value (e.g. the red pixel value) is given by a weighted sum of the corresponding samples values (e.g. the red sample values) for samples falling within the filter support. If the filter weights are not pre-normalized to one, i.e. the sum of the coefficients used in each weighted sum does not equal one, then the weighted sums for the various pixel attributes may be divided by the sum of the filter weights. This sum of the filter weights is referred to herein as the normalization factor.
In cases where the filter function, the filter support, and the set of relative positions of samples with respect to the filter center remain constant from pixel to pixel (and thus, the filter coefficients remain constant), the normalization factor remains the same. In those cases, the normalization factor may be calculated once before the filtering process begins. The coefficients may be pre-normalized by dividing the original coefficients by the normalization factor to generate a set of normalized coefficients. Then, the normalized coefficients may be used in the filtering process for an array of pixels.
However, in many cases, the normalization factor may vary from pixel to pixel. For example, the filtering may take place over a region of non-uniform sample density or at the edges of the screen space. The size and/or shape of the filter support may change from one pixel to the next. The samples may be distributed in the screen space in a random fashion. Thus, the number of samples interior to the filter support and/or their relative positions with respect to the pixel center may vary from pixel to pixel. This implies that the normalization factor (i.e. the sum of the coefficients of the interior samples) varies from pixel to pixel.
In such cases, the normalization factor may be individually computed for each pixel, and instead of pre-normalizing the filter coefficients, the weighted sum (computed on the basis of the non-normalized coefficients) may be post-normalized. In other words, after generating a weighted sum for each attribute, each weighted sum may be divided by the normalization factor. In one embodiment, the computation of the normalization factor may be performed in parallel with the computation of one or more of the pixel attribute summations.
In one set of embodiments, one or more of the per-pixel summations (e.g. the coefficient summation and/or any combination of the attribute summations) may be performed by an adder tree. The adder tree may comprise a plurality of addition levels, and each addition level may include a set of adder cells. An adder cell may receive two input operands and generate one output operand. In one alternative embodiment, an adder cell may receive three input operands and generate two output operands.
The first addition layer may receive a set of numeric values which are to be summed. Each adder cell in the first layer may generate a sum of two (or three) of the numeric values and pass its output operand(s) to the second layer. Each adder cell in layers after the first and before the last layer may receive two (or three) output operands from the previous layer and pass its output operand(s) to the next layer. Thus, the final output from the last layer may represent a sum of all the numeric operands presented to the first layer. Registers may be placed after each adder cell in order to buffer the intermediate summation results.
In some embodiments, the adder tree is configured to add any desired subset of the input numeric values. Thus, in addition to the numeric values, the adder tree is configured to receive a set of data valid signals, i.e. one data valid signal for each numeric value. The data valid signal indicates whether the corresponding numeric value is to be included in the summation output from the adder tree. An adder cell may be configured to support such an adder tree as follows. The adder cell may receive two input operands X1 and X2, and two corresponding data valid inputs DV1 and DV2, and may generate a single output operand Xout. The adder cell output Xout may equal zero, X1, X2 or the sum X1+X2 depending on the state of the data valid input signals. Namely, the output equals zero when both data valid inputs are low, equals X1 when only data valid input DV1 is high, equals X2 when only data valid input DV2 is high, and equals the sum when both data valid inputs are high. The adder cell may also generate a data valid output signal DVout to indicate to an adder cell of the next layer whether the operand output Xout is xe2x80x9cvalidxe2x80x9d, i.e. to be incorporated in a further summation or ignored. Another embodiment of the adder cell contemplates use of a carry-save adder with three operand inputs and two operand outputs. Various embodiments of circuits (such as the adder cell) presented herein are described in terms of active high logic. However, it is understood that these circuit embodiments may be realized in terms of active low logic or a combination of active high logic and active low logic.
In one set of embodiments, the sample-to-pixel calculation unit may be configured to turn off sample filtering, and to generate pixel values based on a xe2x80x9cwinner take allxe2x80x9d criterion. For example, the values of a current pixel may be determined by an identified sample or the first sample (in sequence order) in a memory bin corresponding to the current pixel. Alternatively, the values of the current pixel may be determined by the sample closest to the current filter center or pixel center as suggested by FIG. 31. The red, green, blue and alpha values of this closest sample are assigned as the values of the current pixel.
Previous generation video products have generated pixel values from 3D graphics primitive without intervening super-sampling and super-sample filtering. Thus, in order satisfy users/customers who want their displayed video output to have the same appearance as a previous generation video product, the sample-to-pixel calculation unit may be programmed to disable sample-filtering and enable winner-take-all sample selection.
As described above, an adder tree may be configured to perform an addition of any desired subset of its input numeric values based on the data valid signal inputs to the adder tree. In some embodiments, the adder tree also performs winner-take-all selection of a selected one of the input numeric values. Thus, in addition to data valid signals, the adder tree may receive a set of winner-take-all signals, one winner-take-all signal per input numeric value. In the preferred embodiment, at most one of the winner-take-all signals may be high. When a winner-take-all signal is high, the adder tree passes the corresponding input numeric value to the adder tree output. When all the winner-take-all signals are low, the adder tree generates a summation of those input numeric values having high data valid signals as described above.
Such an adder tree may be facilitated by an adder cell configured as follows. The adder cell may receive two input operands X1 and X2, two corresponding data valid input signals DV1 and DV2, and two corresponding winner-take-all input signals WTA1 and WTA2. The adder cell generates an output operand Xout. When both winner-take-all inputs signals are low, the output operand Xout equals 0, X1, X2 or X1+X2 depending on the state of data valid bits as before. When winner-take-all signal WTA1 is high and winner-take-all signal WTA2 is low, the output operand equals X1. When winner-take-all signal WTA2 is high and winner-take-all signal WTA1 is low, the output operand equals X2. Furthermore, the adder cell may generate a data valid output signal DVout and a winner-take-all output signal WTAout. The data valid output signal DVout indicates to an adder cell in the next layer whether or not the operand output Xout is valid so far an inclusion in a further addition is concerned. The winner-take-all output signal WTAout indicates to the next-layer adder cell whether the output operand Xout represents the winner of the winner-take-all process. Each adder cell in a given layer (after the first layer) may receive the operand output Xout, the data valid output DVout and the winner-take-all output WTAout from two adder cells from the previous layers. Thus, when one of numeric values presented to the first layer has a winner-take-all bit set, that numeric value propagates through each layer to the adder cell output. In one alternative embodiment, an adder cell may be modified to operate with a carry-save adder, and thus, to receive three operands inputs and generate two operand outputs.
Typically, a different summation may be loaded into the adder tree every n clock cycles. This period of time, i.e. the n clock cycles may be referred to as an adder cycle. In certain cases, however, for one or more adder cycles, no valid data may be introduced into the adder tree. For example, in cases where the rate of outputting pixels is much less than the native rate of the graphics system.
In cases where the filter support covers regions with two or more different sample densities, the samples from the lower density regions may contribute less to the final pixel value than the samples from the higher density region. This is because there are typically fewer samples in the lower density region. In one embodiment, the filter coefficients corresponding to samples from the lower sample density regions may be multiplied by a factor approximately equal to the ratio of the sample densities. This may provide more weight to the less-represented samples from the lower density region. In cases where the filter support may include more than two regions of different sample densities, filter coefficients for samples in other regions may also be multiplied by a factor equal to the ratio of the sample densities.
In another embodiment, as the sample density decreases, the extent (e.g., diameter) of the filter may be increased in order to keep the number of samples included in the filtering approximately constant. For example, in an embodiment where the filter is circularly symmetric, the square of the support diameter of the filter may be set to a value that is inversely proportional to the sample density in that region.