The invention relates generally to digital image processing and display of digitally rendered images.
Rendering of three-dimensional scenes typically requires realistic representation of multiple objects in the field of view. The distance of each object from the point of view (also known in 3D graphics as camera position) can determine whether the object blocks (occludes) or is blocked by other objects in the scene. Even in the case of a single object, some of its parts may block or be blocked by other parts depending upon each part""s distance from the point of view. Methods and apparatus used to resolve occlusions and eliminate hidden surfaces play an important role in the creation of realistic images of three-dimensional scenes.
To work effectively, hidden surface elimination methods should have a depth resolution better than the minimal distance between the occluding object and the occluded object in the scene. Such a method should also be simple enough to be implemented in conventional low-cost graphics hardware that accelerates three-dimensional rendering, or in conventional software-only graphics renderers when a hardware accelerator is not available.
Many popular algorithms for hidden surface elimination utilize a special depth buffer, also known as a Z-buffer. Each new pixel at two-dimensional location X, Y on the screen is associated with depth value Z. This value is compared with a depth value stored in the depth buffer at the location corresponding to the same X, Y coordinate. A visibility test compares the new and stored depth values; if the visibility test passes, meaning the new object is closer and therefore blocks the portion of the prior object at the same coordinates, then the depth value in the depth buffer is updated.
Where objects in the scene are rendered as a collection of triangularly shaped surfaces, values of X, Y and Z are computed for each vertex of each triangle by transforming three-dimensional vertex coordinates from the view space (a regular three-dimensional space having an origin of the coordinates aligned with the camera position) to a screen space (a three-dimensional space with the X-Y plane parallel to the screen, but distorted as a result of perspective projection). During this transformation the actual depth of the object in the camera field of view Zv is mapped to the depth Zs in the screen space.
After values Zs are computed for every vertex of a triangle, they are linearly interpolated for every pixel of the surface defined by the triangle during triangle rasterization. Then, interpolation results for each pixel are compared with the Zs values stored in the Z-buffer at the corresponding location to test the visibility of the current pixel. If the current pixel is located behind the current Zs value (i.e., the distance from the camera represented by Zs interpolated for the current pixel is greater than the distance from the camera represented by Zs stored in the depth buffer at the same coordinates X,Y), the pixel is not visible and will not be displayed.
Here a pixel is defined as a set of parameters representing an area of the object""s surface which correspond to a point of the raster grid associated with the screen coordinate space. These parameters can include the two-dimensional coordinates of the point in the raster grid, as well as its color and depth values which correspond to the locations for the area as stored in a color buffer and in a depth buffer. A pixel is visible if its color and depth values are stored at the corresponding locations in the color buffer and in the depth buffer after scene rendering is completed. A pixel is invisible if its parameters are overwritten by another pixel having a depth value corresponding to the smaller distance from the camera.
The mapping between Zv and Zs is non-linear because of the non-linear nature of a perspective projection. However, this mapping should satisfy certain constraints to avoid gross errors during linear interpolation and to fully utilize the precision of a Z-buffer.
First, to avoid errors during linear interpolation, lines and planes in the view space have to be transformed into lines and planes in the screen space. Second, to maximize precision of the Z-buffer, depth values Zs in the screen space should vary from the smallest to largest of the values that can be stored in the Z-buffer. Usually, the range between maximal and minimal values that can be stored in the Z-buffer is mapped to the interval [0,1]; in this case, Z is typically normalized to [0,1] range. An additional constraint, which supports an intuitive notion of depth, is that a point further from the camera in the view space (larger Zv) also has a larger depth in the screen space (larger Zs). As shown by Newman, W. M and Sproull, R. F. (Principles of Interactive Computer Graphics, 1981, McGraw-Hill New York), these conditions are satisfied by following elation between Zv and Zs:                     Zs        =                              f                          f              -              d                                *                      (                          1              -                              d                Zv                                      )                                              [        1        ]            
where f and d are, correspondingly, distances from the camera to the far and near clipping planes bounding the view volume in the screen space.
Equation [1] is widely used in computer graphics to compute depth in the screen space, store results in the depth buffer and evaluate visibility of rendered surfaces. However, the equation suffers from the disadvantage that it is a non-linear mapping between Zv and Zs. This makes Zs less sensitive to changes in Zv which are close to the far end of the view volume as compared with changes in Zv which are close to the near end of the view volume.
For instance, if the ratio of distances between the far and near planes equals 100, a small change in Zv close to the near plane causes a 10,000 times larger change in Zs than the same amount of change in Zv close to the far plane.
A ratio of distances between the far and near planes in excess of 100 is typical for three-dimensional applications that render large open spaces. For example, a flight simulator graphical program may have a range of visual distances from 0.1 mile for the closest point on the ground to a 10-mile distance for the mountains near horizon. If the total resolution of the depth buffer is 16 bits (yielding a Zs range from 0 to 65535 ), a minimum change from Zs=0 to Zs=1 (close to the camera) corresponds to a change in the object""s distance of 0.95 inches, while a change of Zs from Zs=65534 to Zs=65535 (far from the camera) corresponds to a change in the object""s distance of 797 feet.
Large ratios of distances to the far and near planes are also found in applications that render closed interior spaces, especially if such interiors include sets of mirrors with interdependent reflections.
As a result, applications that have a high ratio of distances between the far and near planes typically encounter errors when attempting to eliminate hidden surfaces distant from the camera. These errors can be experienced as, e.g., mountains or buildings randomly appearing in front of each other. Application developers have sometimes attempted to solve this problem by blending far regions of the scene with fog color, effectively decreasing visible range.
In some cases, this solution is satisfactory and provides good depth buffer resolution in the areas close to the viewer; however, in other instances blending of distant objects with fog can detract from the realism of the scene. It is often more desirable to provide high resolution throughout the view volume, whether close or far from the camera.
One method for improving resolution of the depth buffer is to directly store the distance from the camera in fixed-point format (this solution is also known as a W-buffer, in reference to the parameter 1/W, supplied per each vertex for perspective-correct texture mapping and usually proportional to 1/Zv). To produce correct results, 1/W has to be interpolated for each pixel with a subsequent high-precision computation of the reciprocal value. These high-precision per-pixel operations make hardware implementations more difficult; generally, the implementations of this method are limited to software.
To decrease the complexity of hardware implementation, one method described in the U.S. Pat No. 5,856,829 stores depth as an inverse distance to the camera (1/Zv or 1/W), without computing a high-precision Zv or W for each pixel. While this solution may increase precision in comparison with a standard Z-buffer, it does not typically utilize the full range of depth buffer values and may create incorrect visual artifacts for scenes with distances close to near and far planes. Another method for improving resolution of a depth buffer employs per-pixel reformatting operations on the Z value after interpolation, such as by using a square root or logarithm function (as shown in U.S. Pat. No. 5,808,618), or by using a block-fixed format (as described in U.S. Pat. No. 5,856,829). In both cases, such per-pixel operations can increase hardware cost and can degrade performance during critical stages of the rendering pipeline. One more difficulty with such conversions into special formats is that the stored value becomes harder to interpret by an application seeking to directly access the depth buffer (for instance, to identify an object stored at a current location by its depth). Also, block-fixed formats can cause sharp precision changes at the boundaries of fixed-format ranges, which may cause undesirable image artifacts.
Another way to increase resolution of the depth buffer includes storing a floating-point value of the depth instead of a fixed-point value. The floating-point value is stored as a combination of exponent and mantissa (with an optional sign bit, if the value can be negative). Floating-point storage improves effective resolution of the inverse W-buffer; however, this improvement is limited by the fact that the inverse W-buffer does not cover the entire available storage range, especially since it excludes from the representation of depth the smallest floating-point values having a rapidly changing exponent. Incomplete compensation of non-linear decrease of depth resolution for small 1/W values can make the precision of an inverse W-buffer significantly lower than that of a fixed-point W-buffer for the same per-pixel storage size.
Another disadvantage common to both W-buffer and inverse W-buffers is that they require depth values Zv linearly proportional to the distance from the camera for every vertex. 3D rendering APIs, such as OpenGL or Direct3D, do not require a graphics application to provide a parameter proportional to Zv if the application performs coordinate transformation to the screen space by itself. Thus, a graphics application can omit parameter W, for instance if it does not require perspective-correct texture mapping, or can use a non-linear W in relation to Zv, as in case of projected textures. Therefore, both W-buffer and inverse W-buffer methods can only be used generally in a subset of 3D rendering applications and cannot be considered universal solutions.
In general, in one aspect, the invention features a method for evaluating the depth of a pixel in a scene, the scene enclosed in a view volume, the scene to be rendered from a camera position, the view volume having a near and a far plane, including calculating a depth value for a pixel in the scene, the depth value being generated by a depth function of view distance within the view volume from the camera position, and storing the depth value in a floating-point format, the floating-point format including a mantissa and exponent, where, as the distance of the pixel to the far plane decreases, the absolute magnitude of the depth value generated by the depth function approaches the minimum non-negative number representable by the floating-point format.
Embodiments of the invention may include one or more of the following features. For a pixel within the view volume having the minimal non-zero distance to the far plane represented by a unique stored depth value, the depth value can be stored as the smallest positive floating-point number representable by the floating-point format. For a pixel at the far plane, the depth value can be stored as 0 in the floating-point representation. For a pixel within the view volume having the minimal non-zero distance to the near plane represented by a unique stored depth value, the depth value can be stored as the second largest floating-point number representable by the floating-point format. For a pixel at the near plane, the depth value can be stored as the largest floating-point number representable by the floating-point format.
The depth function can generate depth values whose values or magnitudes decrease with increasing distance from the camera in the view volume. A new depth value can be calculated for a new pixel in the scene, the new pixel corresponding to a same location on a raster grid as the pixel, the new depth value being generated by the depth function, and the new depth value for the new pixel can be compared with a stored depth value for the pixel. The comparing step can further include determining whether the new depth value is greater, equal to, or less than the stored depth value for the pixel, and depending upon whether it is greater, equal to, or less than the stored depth value, indicating that the new pixel is visible or invisible.
The depth function, represented by Zs, can be:   Zs  =            f              f        -        d              *          (                        f          Zv                -        A            )      
where d is the distance to a near plane of the view volume, f is the distance to a far plane of the view volume, Zv is the distance to a particular pixel in the view volume, and A is a constant. A can be equal to 1.
In general, in another aspect, the invention features a method for evaluating the depth of a pixel in a scene, the scene enclosed in a view volume, the scene to be rendered from a camera position, the view volume having a near and a far plane, including calculating a depth value for a pixel in the scene, the depth value being generated by a depth function of view distance within the view volume from the camera position, and storing the depth value in a floating-point format, the floating-point format including a mantissa and exponent, where substantially the entire set of points of view distance, from the near to far plane in the view volume, is mapped by the depth function to substantially the entire set of floating-point numbers representable in the floating-point format.
Embodiments of the invention may include one or more of the following features. A pixel located at the near plane can be mapped to the floating-point number with the maximum absolute magnitude representable in the floating-point format and a pixel located at the far plane can be mapped to the floating-point number with the minimum absolute magnitude representable in the floating-point format. The pixel located at the near plane can be mapped to the largest positive or to the largest negative floating-point value representable in the floating-point format and the pixel located at the far plane can be mapped to 0 in the floating-point format. The pixel parameters can be derived from one or more polygons having vertices. The depth value for the pixel can be generated by transforming vertex coordinates of a polygon of the object using the depth function and then interpolating one or more vertex coordinates of the object in the screen space to determine the depth value of the pixel. The depth function can compute the depth value by subtracting a depth value interpolated per pixel from a constant value. The constant value can be equal to the maximum depth value that can be stored in the floating-point format. The depth function can compute the depth value by subtracting a constant value from the reciprocal of the homogeneous coordinate W of the view distance corresponding to the pixel. The constant value can be equal to the reciprocal of the homogenous coordinate W corresponding to the view distance from the camera to the far plane. The depth function can compute the depth value by multiplying the homogeneous coordinate W of the view distance corresponding to the pixel by a scale factor, and subtracting a constant value. The constant value and scale factor can be selected such that the depth function generates, for the near and far planes respectively, depth values that have, respectively, the maximum and minimum absolute magnitude that can be represented by the floating point format.
In general, in another aspect, the invention features a method for evaluating the depth of a pixel in a scene, the scene enclosed in a view volume, the scene to be rendered from a camera position, the view volume having a near and a far plane, the method including calculating one or more depth values for pixels in the scene, the depth values being generated by a piece-wise continuous function of distance within the view volume from the camera position, and storing the depth values as floating-point numbers in a depth buffer using a floating-point format, such that for at least one pixel inside the view volume at which the derivative of the depth value function over the distance within the view volume has the smallest absolute magnitude over the view volume, its depth value is stored using a floating-point number having the smallest absolute magnitude representable in the floating-point format.
In general, in another aspect, the invention features a method for visibility testing of pixels rendered during rasterization of a scene in a sequence of scenes, each scene consisting of objects having points represented by coordinates in a view volume, the scene to be rendered from a camera position into a screen space having an associated raster grid, the view volume having a near and a far plane, the method including generating depth values for at least one pixel of an object using a depth function, and comparing the generated depth value with a depth value stored in a depth buffer for the same location on the raster grid, where said stored depth value represents distance to the location from the camera, where the stored depth values for a location on the raster grid, with respect to two different scenes, are stored in at least two different numerical formats having the same number of bits per pixel, respectively.
Embodiments of the invention may include one or more of the following features. At least one of the numerical formats can be a floating-point representation including a mantissa and an exponent. The at least two different numerical formats can be floating-point formats having the same total number of bits but with different numbers of bits for exponent and mantissa. The number of bits for exponent and mantissa for the depth values for a scene can be selected based upon one or more known parameters of the scene. One of the known parameters of the scene can be the ratio of distances from the camera of the far and near planes. One of the known parameters of the scene can be a distribution of the distances from the camera to the surfaces of the objects in the scene. One of the known parameters of the scene can be the location of an area of interest inside the view volume. The size of the exponent used for a scene with a larger ratio of distances to the far and near planes can be larger than for a scene with a smaller ratio of distances to the far and near planes, both scenes having substantially the same distribution of the distances from the camera to the surfaces of the objects in the scene. The size of the exponent used for a scene with an area of interest more distant from the camera can be larger than for a scene with an area of interest less distant from the camera, both scenes having substantially the same ratio of the distances to the far and near planes.
In general, in another aspect, the invention features a method for visibility testing of pixels rendered during rasterization of a scene in a sequence of scenes, each scene consisting of objects having points represented by coordinates in a view volume, the scene to be rendered from a camera position into a screen space having an associated raster grid, the view volume having a near and a far plane, the method including generating depth values for at least one pixel corresponding to a point in the view volume using a depth function, and comparing the generated depth value with a depth value stored in a depth buffer for the same location on the raster grid, where said stored depth value represents distance from the camera to the point in the view volume, where depth values for the same location on the raster grid for the same scene can be stored using at least two selectable different modes of operation, the two selectable modes of operation generating different stored depth values for the same distance from the camera to the point in the view volume to and for the same distances from the camera to the near and far planes in the view volume.
Embodiments of the invention may include one or more of the following features. The at least two selectable different modes of operation can be at least two different functions for mapping view distance of the pixel from the camera to the stored depth value. A selection between the at least two different functions can include switching between two different matrices that transform pixel coordinates from the view space to the screen space. During each mode of operation the depth value of the vertex can be generated by transforming coordinates of the vertex to screen space, the transformation producing positive preliminary depth values that increase with increase of distance to the camera, where for a first mode of operation the preliminary depth value of the vertex is modified such that the absolute value of the result of the first mode of operation decreases with an increase of the distance to the camera, while for a second mode of operation the resulting depth value of the vertex is substantially unchanged from the preliminary depth value. During each mode of operation the depth value of a pixel can be generated by transforming vertex coordinates of the object to screen space, the transformation producing positive depth values that increase with increase of distance to the camera and by interpolating per-vertex depth values to determine a preliminary depth value of the pixel, where for a first mode of operation the preliminary depth value of the pixel is modified such that the absolute value of the result of the first mode of operation decreases with an increase of the distance to the camera, while for a second mode of operation the depth of a vertex is substantially unchanged from the preliminary depth value.
In general, in another aspect, the invention features apparatus for evaluating the depth of a pixel in a scene, the scene enclosed in a view volume, the scene to be rendered from a camera position, the view volume having a near and a far plane, including a depth value calculation module configured to calculate a depth value for a pixel in the scene, the depth value being generated by a depth function of view distance within the view volume from the camera position, and a depth storage module configured to store the depth value in a depth value storage buffer using a floating-point format, the floating-point format including a mantissa and exponent, the depth value calculation module configured to calculate the depth value for the pixel such that, as the distance of the pixel to the far plane decreases, the absolute magnitude of the depth value generated by the depth function approaches the minimum non-negative number representable by the floating-point format of the depth value storage buffer.
In general, in another aspect, the invention features apparatus for visibility testing of pixels rendered during rasterization of a scene in a sequence of scenes, each scene consisting of objects having points represented by coordinates in a view volume, the scene to be rendered from a camera position into a screen space having an associated raster grid, the view volume having a near and a far plane, including a depth value calculation module configured to generate depth values for at least one pixel of an object using a depth function, and a visibility test module configured to compare the generated depth value with a depth value stored in a depth value storage buffer for the same location on the raster grid, where said stored depth value represents distance to the location from the camera, where the stored depth values for a location on the raster grid, with respect to two different scenes, are stored in at least two different numerical formats having the same number of bits per pixel, respectively.
In general, in another aspect, the invention features apparatus for visibility testing of pixels rendered during rasterization of a scene in a sequence of scenes, each scene consisting of objects having points represented by coordinates in a view volume, the scene to be rendered from a camera position into a screen space having an associated raster grid, the view volume having a near and a far plane, including a depth value calculation module configured to generate depth values for at least one pixel corresponding to a point in the view volume using a depth function, and a visibility test module configured to compare the generated depth value with a depth value stored in a depth value storage buffer for the same location on the raster grid, where said stored depth value represents distance from the camera to the point in the view volume, where depth values for the same location on the raster grid for the same scene can be stored using at least two selectable different modes of operation, said functions producing different stored depth values for the same distance of the point in the view volume from the camera and same distances of the far and near planes of the view volume from the camera.
Advantages of the invention may include one or more of the following. Depth resolution of objects located close to the far plane of the view volume can be increased. The maximal ratio between distances to the far plane and to the near plane of the view volume corresponding to pre-determined depth resolution of objects in the view volume can also be increased. The dependency of the depth resolution on the ratio between distances to the far plane and to the near plane of the view volume can be decreased. Also, the dependency of the depth resolution on the position of the object inside the view volume can be decreased. Further, depth resolution can be improved without requiring additional per-pixel operations that can slow down rendering or increase complexity of a hardware implementation. Also, the full range between minimal and maximal values that can be stored in the depth buffer can be utilized for each three dimensional scene, independent of the ratio of distances between the far and near planes of the view volume. Also, the invention can provide for user-controlled selection between highest depth resolution for objects close to the far plane of the view volume or close to the near plane of the view volume. Also, depth values do not need to be linearly proportional to distance or inverse distance in the view space, as required by W-buffers and inverse W-buffers, thereby supporting a larger set of 3D rendering applications.
These and other features and advantages of the present invention will become more apparent from the following description, drawings, and claims.