The invention relates generally to digital image processing and display of digitally rendered images.
Rendering of three-dimensional scenes typically requires realistic representation of multiple objects in the field of view. The distance of each object from the point of view (also known in 3D graphics as camera position ) can determine whether the object blocks (occludes) or is blocked by other objects in the scene. Even in the case of a single object, some of its parts may block or be blocked by other parts depending upon each part's distance from the point of view. Methods and apparatus used to resolve occlusions and eliminate hidden surfaces play an important role in the creation of realistic images of three-dimensional scenes.
To work effectively, hidden surface elimination methods should have a depth resolution better than the minimal distance between the occluding object and the occluded object in the scene. Such a method should also be simple enough to be implemented in conventional low-cost graphics hardware that accelerates three-dimensional rendering, or in conventional software-only graphics renderers when a hardware accelerator is not available.
Many popular algorithms for hidden surface elimination utilize a special depth buffer, also known as a Z-buffer. Each new pixel at two-dimensional location X, Y on the screen is associated with depth value Z. This value is compared with a depth value stored in the depth buffer at the location corresponding to the same X, Y coordinate. A visibility test compares the new and stored depth values; if the visibility test passes, meaning the new object is closer and therefore blocks the portion of the prior object at the same coordinates, then the depth value in the depth buffer is updated.
Where objects in the scene are rendered as a collection of triangularly shaped surfaces, values of X, Y and Z are computed for each vertex of each triangle by transforming three-dimensional vertex coordinates from the view space (a regular three-dimensional space having an origin of the coordinates aligned with the camera position) to a screen space (a three-dimensional space with the X-Y plane parallel to the screen, but distorted as a result of perspective projection). During this transformation the actual depth of the object in the camera field of view Zv is mapped to the depth Zs in the screen space.
After values Zs are computed for every vertex of a triangle, they are linearly interpolated for every pixel of the surface defined by the triangle during triangle rasterization. Then, interpolation results for each pixel are compared with the Zs values stored in the Z-buffer at the corresponding location to test the visibility of the current pixel. If the current pixel is located behind the current Zs value (i.e., the distance from the camera represented by Zs interpolated for the current pixel is greater than the distance from the camera represented by Zs stored in the depth buffer at the same coordinates X,Y), the pixel is not visible and will not be displayed.
Here a pixel is defined as a set of parameters representing an area of the object's surface which correspond to a point of the raster grid associated with the screen coordinate space. These parameters can include the two-dimensional coordinates of the point in the raster grid, as well as its color and depth values which correspond to the locations for the area as stored in a color buffer and in a depth buffer. A pixel is visible if its color and depth values are stored at the corresponding locations in the color buffer and in the depth buffer after scene rendering is completed. A pixel is invisible if its parameters are overwritten by another pixel having a depth value corresponding to the smaller distance from the camera.
The mapping between Zv and Zs is non-linear because of the non-linear nature of a perspective projection. However, this mapping should satisfy certain constraints to avoid gross errors during linear interpolation and to fully utilize the precision of a Z-buffer.
First, to avoid errors during linear interpolation, lines and planes in the view space have to be transformed into lines and planes in the screen space. Second, to maximize precision of the Z-buffer, depth values Zs in the screen space should vary from the smallest to largest of the values that can be stored in the Z-buffer. Usually, the range between maximal and minimal values that can be stored in the Z-buffer is mapped to the interval [0,1]; in this case, Z is typically normalized to [0,1] range. An additional constraint, which supports an intuitive notion of depth, is that a point further from the camera in the view space (larger Zv) also has a larger depth in the screen space (larger Zs). As shown by Newman, W. M and Sproull, R. F. (Principles of Interactive Computer Graphics, 1981, McGraw-Hill New York), these conditions are satisfied by following relation between Zv and Zs: ##EQU1##
where f and d are, correspondingly, distances from the camera to the far and near clipping planes bounding the view volume in the screen space.
Equation [1] is widely used in computer graphics to compute depth in the screen space, store results in the depth buffer and evaluate visibility of rendered surfaces. However, the equation suffers from the disadvantage that it is a non-linear mapping between Zv and Zs. This makes Zs less sensitive to changes in Zv which are close to the far end of the view volume as compared with changes in Zv which are close to the near end of the view volume.
For instance, if the ratio of distances between the far and near planes equals 100, a small change in Zv close to the near plane causes a 10,000 times larger change in Zs than the same amount of change in Zv close to the far plane.
A ratio of distances between the far and near planes in excess of 100 is typical for three-dimensional applications that render large open spaces. For example, a flight simulator graphical program may have a range of visual distances from 0.1 mile for the closest point on the ground to a 10-mile distance for the mountains near the horizon. If the total resolution of the depth buffer is 16 bits (yielding a Zs range from 0 to 65535), a minimum change from Zs=0 to Zs=1 (close to the camera) corresponds to a change in the object's distance of 0.95 inches, while a change of Zs from Zs=65534 to Zs=65535 (far from the camera) corresponds to a change in the object's distance of 797 feet.
Large ratios of distances to the far and near planes are also found in applications that render closed interior spaces, especially if such interiors include sets of mirrors with interdependent reflections.
As a result, applications that have a high ratio of distances between the far and near planes typically encounter errors when attempting to eliminate hidden surfaces distant from the camera. These errors can be experienced as, e.g., mountains or buildings randomly appearing in front of each other. Application developers have sometimes attempted to solve this problem by blending far regions of the scene with fog color, effectively decreasing visible range.
In some cases, this solution is satisfactory and provides good depth buffer resolution in the areas close to the viewer; however, in other instances blending of distant objects with fog can detract from the realism of the scene. It is often more desirable to provide high resolution throughout the view volume, whether close or far from the camera.
One method for improving resolution of the depth buffer is to directly store the distance from the camera in fixed-point format (this solution is also known as a W-buffer, in reference to the parameter 1/W, supplied per each vertex for perspective-correct texture mapping and usually proportional to 1/Zv ). To produce correct results, 1/W has to be interpolated for each pixel with a subsequent high-precision computation of the reciprocal value. These high-precision per-pixel operations make hardware implementations more difficult; generally, the implementations of this method are limited to software.
To decrease the complexity of hardware implementation, one method described in the U.S. Pat. No. 5,856,829 stores depth as an inverse distance to the camera (1/Zv or 1/W), without computing a high-precision Zv or W for each pixel. While this solution may increase precision in comparison with a standard Z-buffer, it does not typically utilize the full range of depth buffer values and may create incorrect visual artifacts for scenes with distances close to near and far planes. Another method for improving resolution of a depth buffer employs per-pixel reformatting operations on the Z value after interpolation, such as by using a square root or logarithm function (as shown in U.S. Pat. No. 5,808,618), or by using a block-fixed format (as described in U.S. Pat. No. 5,856,829). In both cases, such per-pixel operations can increase hardware cost and can degrade performance during critical stages of the rendering pipeline. One more difficulty with such conversions into special formats is that the stored value becomes harder to interpret by an application seeking to directly access the depth buffer (for instance, to identify an object stored at a current location by its depth). Also, block-fixed formats can cause sharp precision changes at the boundaries of fixed-format ranges, which may cause undesirable image artifacts.
Another way to increase resolution of the depth buffer includes storing a floating-point value of the depth instead of a fixed-point value. The floating-point value is stored as a combination of exponent and mantissa (with an optional sign bit, if the value can be negative). Floating-point storage improves effective resolution of the inverse W-buffer; however, this improvement is limited by the fact that the inverse W-buffer does not cover the entire available storage range, especially since it excludes from the representation of depth the smallest floating-point values having a rapidly changing exponent. Incomplete compensation of non-linear decrease of depth resolution for small 1/W values can make the precision of an inverse W-buffer significantly lower than that of a fixed-point W-buffer for the same per-pixel storage size.
Another disadvantage common to both W-buffer and inverse W-buffers is that they require depth values Zv linearly proportional to the distance from the camera for every vertex. 3D rendering APIs, such as OpenGL or Direct3D, do not require a graphics application to provide a parameter proportional to Zv if the application performs coordinate transformation to the screen space by itself. Thus, a graphics application can omit parameter W, for instance if it does not require perspective-correct texture mapping, or can use a non-linear W in relation to Zv, as in case of projected textures. Therefore, both W-buffer and inverse W-buffer methods can only be used generally in a subset of 3D rendering applications and cannot be considered universal solutions.