1. Field of the Invention
The invention relates generally to digital image processing and the display of digitally generated images.
The invention relates more specifically to the problem of creating raster-based, high-resolution animated images in real time with the aid of a 3D rendering engine.
2. Description of the Related Art
In recent years, the presentation and prepresentation processing of visual imagery has shifted from what was primarily an analog electronic format to an essentially digital format.
Unique problems come to play in the digital processing of image data and the display of such image data.
The more prominent problems include providing adequate storage capacity for digital image data and maintaining acceptable data throughput rates while using hardware of relatively low cost. In addition, there is the problem of creating a sense of realism in digitally generated imagery, particularly in animated forms of such imagery.
The visual realism of imagery that is generated by digital video game systems, by simulators and the like can be enhanced by providing special effects such as, but not limited to, making real-time changes in the orientation and/or shadowing and/or highlighting of various objects, smoothing or sharpening the contours of various objects at different times, and so forth.
Visual realism can be further enhanced by projecting 3-dimensional (3D) surface definitions from a model space onto a 2-dimensional (2D) image plane and rotating, scaling or otherwise manipulating the 3-dimensional surface definitions in real time prior to their projection onto the 2-dimensional image plane.
Visual realism can be additionally enhanced by increasing the apparent resolution of a displayed image so that it has a smooth photography-like quality rather than a grainy disjoined-blocks appearance of the type found in low-resolution computer-produced graphics of earlier years.
Visual realism can be even further enhanced by increasing the total number of different colors and/or shades in each displayed frame of an image so that, in regions where colors and/or shades are to change in a smooth continuum by subtle degrees of hue/intensity, the observer perceives such a smooth photography-like variation of hue/intensity rather than a stark and grainy jump from one discrete color/shade to another.
Although bit-mapped computer images originate as a matrix of discrete lit or unlit pixels, the human eye can be fooled into perceiving an image having the desired photography-like continuity if the displayed matrix of independently-shaded (and/or independently colored) pixels has dimensions of approximately 500-by-500 pixels or better at the point of display and a large variety of colors and/or shades on the order of roughly 24 bits-per-pixel or better.
The human brain can be tricked into perceiving a displayed 2-dimensional moving image as being somewhat 3-dimensional in nature if a sufficient number of cues are generated in real-time to support such perception. These cues include but are not limited to:
(a) drawing images along angled lines of perspective to create a sense of depth; PA1 (b) shading images to simulate 3-dimensional lighting effects including shadows and reflections; PA1 (c) allowing displayed objects to rotate so as to show their side and back surfaces; PA1 (d) allowing displayed objects to appear to move forward and back relative to the viewer by appropriately scaling their size; and PA1 (e) allowing displayed objects to move in front of one another as if they were 3-dimensional and had all the associated properties of the real objects they portray. PA1 (a) native-surface color component values such as R, G, and B that are respectively attached to each point; PA1 (b) a blending factor referred to herein as `A` that is respectively attached to each point and is used for blending the native-surface color component values with respective prior frame-buffer values and/or respective texturizing values; PA1 (c) depth-adjusted texture mapping coordinates that are respectively attached to each point and are referred to herein as `u/w` and `v/w`, where u and v are coordinates of a bitmapped 2-dimensional texture image defined using depth-independent coordinates, the texture image having its own R, G, B, and/or A values for each of its points; and PA1 (d) a homogeneity factor referred to herein as `1/w`.
The above set of visual cues imply that each rotating object having reflective surfaces needs to have its correspondingly surrounding 3D visual environment wrapped about its surface, and distorted in accordance with the contours of its surface in order to create the illusion of 3-dimensional reflection. The above set of visual cues further presuppose that translucent moving objects passing in front of other objects should appear to translucently pass through the imagery of the object behind.
Carrying out all these 3D cuing operations in real-time can be quite complicated and difficult, particularly if an additional constraint is added that the implementing hardware has to be of relatively small size and low cost.
Compound systems are being proposed that have real-time 3-dimensional object defining and manipulating means as well as other means that contend for access to a shared system memory and for access to the shared resources of system CPU's.
The compound nature of such systems places a strain on system memory to deliver (or to store) time-critical data to (or from) devices or modules that need to operate on a real-time basis.
An example of time-critical data is video data that may be needed on a real-time basis, within the time window of a horizontal raster line for example, in order to provide real-time display for an interactive game or an interactive simulator.
The compound nature of such systems also increases the likelihood that a minor software error (bug) in one software module will induce an unintended write to a system critical register or memory location and bring the whole system down.
The proposed compound systems have so many hardware and software functionalities that they strain the throughput capabilities (data bandwidth) of the memory-access management subsystem and complicate the tasks of the memory-access management subsystem. The memory-access management subsystem now needs to arbitrate among a larger number of contenders for memory access.
The added functionalities of the proposed compound systems additionally strain the throughput capabilities and complicate the tasks of any system CPUs that have to supervise the activities of the image manipulating and rendering means on a real-time basis. (The term "CPUs" refers here to a general-purpose data processing subsystem which may be implemented either in a centralized unit format, as for example a single truly-central processing unit; or which may be implemented in a plural units format, such as in a parallel processing system.)
A system architecture is needed for reducing contention among plural potential requesters for system memory access.
A system architecture is needed for reducing contention by plural software modules for access to the limited resources of system CPUs.
A methodology is needed for simultaneously satisfying the needs of multiple, time-critical processes such as those of a real-time video display subsystem and those of a real-time animation subsystem.
A methodology is needed for reducing the likelihood that a wayward software or hardware module will bring down the entire system by unintentionally writing to a system critical register or memory location.
In general, a goal of 3D computer graphics is to create a 2D projection on a cathode-ray tube ("CRT") screen of a three-dimensional model as viewed from a predetermined viewpoint in three-dimensional model space. One aspect of such a projection is the need to keep track of which objects are in front of other objects, and which are behind, when viewed from the viewpoint. This knowledge is necessary to ensure that, for example, a building in the foreground will properly occlude a building in the distance. This aspect of the rendering process is known as "occlusion mapping".
One popular technique to perform occlusion mapping uses a construct known as a "z buffer". A standard z buffer linearly associates a number called the "z value", representing the distance from the observer (depth in the scene relative to a projection plane), with each pixel drawn on the screen. When a first object is projected, attributes of its pixels (such as color) are stored in a "frame buffer", and the z value associated with each pixel is separately stored in the z buffer. If a second object from the model subsequently projects onto the same pixel, the new object's z value is compared against the z value already stored for that pixel, and only if the new value is less (representing an object closer to the viewer) will the new pixel be drawn.
FIG. 8 illustrates the rendering of Object 1 and Object 2 at different distances or z values from a projection plane or image plane, considered for purposes of illustration to be located at z=0 in the model space. In FIG. 8, object 1 is projected and rendered first. Object 2 is rendered second. The z buffer prevents pixels of object 2 from being written to the frame buffer in the locations where object 1 has already written pixels with a lesser z value. Thus, object 2 appears in the ultimately displayed image to be behind object 1, as desired.
Z buffers can be implemented in either hardware or software. The numbers stored can be either floating point or integer values. Any number of bits can be devoted to the z values. In general, the more bits that are devoted to storing the z value, the finer the resolution in distance that can be achieved. Because z values represent the depth of an object in a scene, z values can be more generally referred to as "depth values", and z-buffers can be more generally referred to as "depth buffers". Also, in particular implementations, depth values can be increasing with increasing depth, or can be decreasing with increasing depth. Since the invention is not restricted to one such implementation or the other, depth values sometimes are referred to herein as being "farther" or "nearer" to the viewpoint than other depth values.
Another feature of 3D graphics systems is the ability to map a texture onto an object with a perspective-correct mapping, as seen in FIG. 9. In the simplest form, a texture can be thought of as a decal that is applied to the surface of an object, such as a design on the surface of a cube. In FIG. 9, 902 designates the cube in a model space 903 and 904 designates "texture space" containing a "texture map" 905 which is to be applied to all three visible surfaces 906, 908 and 910 of the cube as part of the rendering process. Since the surface 906 of the cube 902 is parallel to the projection plane (considered for purposes of this illustration to be at z=0 in the model space 903), the texture map 905 can be applied directly onto that surface. Surface 908 is at an angle to the projection plane, so a perspective transformation of the texture map 905 is needed before applying it to the surface 908. Such a map, as transformed, is illustrated at 912. Similarly, a different perspective transformation of the texture map 905 is required before applying it to the surface 910; such a map, as transformed, is illustrated at 914. The final image, as projected onto the projection plane and with perspective-correct texturing applied, is illustrated as 916 in FIG. 9. It can be seen that the shape of the decal on the side of the box is warped as the box is rotated to preserve the illusion of three-dimensionality in the 2D projection.
In order to achieve perspective-correct texture mapping, graphics rendering systems traditionally perform projection calculations using a 4.times.4 matrix representing the transformation to be performed. 4.times.4 matrices are used also for many other kinds of transformations in the 3D model space, all as discussed in Foley et al., "Computer Graphics, Principles and Practice," 2d. ed. (Addison-Wesley: 1991), especially at pp. 213-226 and 253-281. The entire Foley text is incorporated herein by reference.
4.times.4 matrix transformations depend on the representation of points in the 3D model space using "homogenous coordinates", in which a fourth coordinate, w, is added to the traditional three spatial coordinates x, y and z. Two sets of homogenous coordinates are considered to refer to the same point in 3-space if one is a multiple of the other. Thus (x,y,z,w) refers to the same point as (x/y, y/w, z/w, 1) in which representation the fourth coordinate ("1") can be dropped. The process of dividing through by w (and optionally dropping the last coordinate) is referred to herein as the process of "homogenizing" the point, after which the representation is referred to as an "homogenized" representation of the point. Similarly, the process of multiplying through by any non-zero and non-unity w value is referred to herein as the process of "de-homogenizing" the point, after which the representation is referred to herein as "de-homogenized". The value (1/w) is referred to herein as an "homogeneity factor", because the point is "homogenized" by multiplying each of the coordinates by (1/w). The value w is referred to herein as an "homogeneity divisor", because the point is homogenized by dividing each of the coordinates by w. The term "homogeneity value" as used herein includes both homogeneity factors and homogeneity divisors, since it does not imply the function (e.g. multiplication or division) by which it is to be applied to the other coordinates to achieve homogeneity. As will be seen, the homogeneity value for a point is related to its depth in the scene.
Thus the projection calculations traditionally performed naturally yield a homogeneity value ((1/w) or w) for each point projected onto an image plane. Traditional texture mapping, which maps a texture onto a planar polygon of the model, utilizes the homogeneity values as follows.
Initially, each vertex of the model space polygon is assigned, in addition to attribute values and its model space coordinates (x,y,z), a pair of depth-independent coordinates (u,v) into a depth-independent texture space. The texture space is considered herein to be "depth-independent" because it is defined with only two Cartesian coordinates (u and v). For each vertex, homogenized image space coordinates (x/w, y/w) are calculated using homogenous transformations. This calculation yields the homogeneity value 1/w for the vertex, which is applied to the depth-independent texture space coordinates (u,v) for the vertex to generate "depth-adjusted" texture coordinates (u/w, v/w). These can be thought of as coordinates into a "depth-adjusted texture space".
Next, for each i'th pixel of the polygon as projected onto the image plane, in addition to calculating its new image space coordinates (x.sub.i /w.sub.i, y.sub.i /w.sub.i) by interpolation, the depth-adjusted coordinates (u.sub.i /w.sub.i, v.sub.i /w.sub.i) into the depth-adjusted texture space are also calculated by interpolation. The homogeneity value 1/w.sub.i is also interpolated for the i'th pixel from the homogeneity values of the polygon vertices. Because the predefined texture map is indexed in depth-independent texture space and not depth-adjusted texture space, the texture coordinates (u.sub.i /w.sub.i, v.sub.i /w.sub.i) for the i'th pixel are then converted back to dept-independent texture space by multiplying through by w.sub.i. This yields depth-independent coordinates (u.sub.i,v.sub.i) into depth-independent texture space. The corresponding texture value T.sub.i can then be retrieved and/or calculated from the predefined texture map and applied to the polygon as projected onto the image plane.
In general, at least for one-point perspective projections onto an image plane perpendicular to the z axis, the homogeneity value produced by the 4.times.4 matrix calculation is related to the depth coordinate z of the point in model space by a linear relationship of the form EQU w=.alpha.z+.beta.,
where .alpha.and .beta. are constants which depend on such variables as the position of the image plane on the z-axis in model space and the chosen center of projection (COP). For projections onto the plane z=d with COP=(0,0,0), it can be shown that w=z/d.
But while this simple relationship between z and w is well known, conventional systems do not use homogeneity values for occlusion mapping, only for texture mapping. In order to maximize texture-mapping resolution in a given graphics rendering system, most conventional systems texture map each polygon separately. This enables the homogeneity values for the polygon (including its vertices and each of its interior pixels) to be scaled so as to occupy the full range of numeric values that can be carried by the hardware. The above texture mapping procedure is performed using these full-scale (1/w) values. Accordingly, while the homogeneity values of different surface regions of the polygon might have been initially related to the depth of the surface region in the overall rendered scene, after scaling, they are related only to their depths in the scene relative to the other surface regions of the same polygon.
Thus, a traditional 3D graphics system computes two parameters for each pixel, a z value which is used in the z buffer, and a 1/w value which is used for perspective-correct texture mapping. The two are treated entirely independently. Additional complexity and cost is required in computing both the z value and 1/w value. Further, computing both values, rather than one of the values, limits the speed with which an image may be rendered and displayed. Therefore, it is desirable to provide an apparatus and method in which only one of the two values need be calculated, but without significant loss of precision in either the texture-mapping or occlusion mapping operations.