1. The Field of the Invention
The present invention relates generally to graphical rendering devices and systems. Specifically, the invention relates to devices and systems for conducting highly realistic three-dimensional graphical renderings.
2. The Relevant Art
Graphical rendering involves the conversion of one or more object descriptions to a set of pixels that are displayed on an output device such as a video display or image printer. Object descriptions are generally mathematical representations that model or represent the shape and surface characteristics of the displayed objects. Graphical object descriptions may be created by sampling real world objects and/or by creating computer-generated objects using various editors.
In geometric terms, rendering requires representing or capturing the details of graphical objects from the viewer""s perspective to create a two-dimensional scene or projection representing the viewer""s perspective in three-dimensional space. The two-dimensional rendering facilitates viewing the scene on a display device or means such as a video monitor or printed page.
A primary objective of object modeling and graphical rendering is realism, i.e., a visually realistic representation that is life-like. Many factors impact realism, including surface detail, lighting effects, display resolution, display rate, and the like. Due to the complexity of real-world scenes, graphical rendering systems are known to have an insatiable thirst for processing power and data throughput. Currently available rendering systems lack the performance necessary to make photo-realistic renderings in real-time.
To increase rendering quality and reduce storage requirements, surface details are often separated from the object shape and are mapped onto the surfaces of the object during rendering. The object descriptions including surface details are typically stored digitally within a computer memory or storage medium and referenced when needed.
One common method of representing three-dimensional objects involves combining simple graphical objects into a more realistic composite model or object. The simple graphical objects, from which composite objects are built, are often referred to as primitives. Examples of primitives include triangles, surface patches such as bezier patches, and voxels.
Voxels are volume elements, typically cubic in shape, that represent a finite, three-dimensional space similar to bitmaps in two-dimensional space. Three-dimensional objects may be represented using a primitive comprising a three-dimensional array of voxels. A voxel object is created by assigning a color and a surface normal to certain voxel locations within the voxel array while marking other locations as transparent.
Voxel objects reduce the geometry bandwidth and processing requirements associated with rendering. For example, objects represented with voxels typically have smaller geometry transform requirements than similar objects constructed from triangles. Despite this advantage, existing voxel rendering algorithms are typically complex and extremely hardware intensive. A fast algorithm for rendering voxel objects with low hardware requirements would reduce the geometry processing and geometry bandwidth requirements of rendering by allowing certain objects to be represented by voxel objectss instead of many small triangles.
As mentioned, rendering involves creating a two-dimensional projection representing the viewer""s perspective in a three-dimensional space. One common method of creating a two-dimensional projection involves performing a geometric transform on the primitives that comprise the various graphical objects within a scene. Performing a geometric transform changes any coordinates representing objects from an abstract space known as a world space into actual device coordinates such as screen coordinates.
After a primitive such as a triangle has been transformed to a device coordinate system, pixels are generated for each pixel location which is covered by that primitive. The process of converting graphical objects to pixels is sometimes referred to as rasterization or pixelization. Texture information may be accessed in conjunction with pixelization to determine the color of each of the pixels. Because more than one primitive may be covering any given location, a z-depth for each pixel generated is also calculated, and is used to determine which pixels are visible to the viewer.
FIGS. 1a and 1b depict a simplified example of graphical rendering. Referring to FIG. 1a, a graphical object 100 may be rendered by sampling attributes such as object color, texture, and reflectivity at discrete points on the object. The sampled points correspond to device-oriented regions, typically round or rectangular in shape, known as pixels 102. The distance between the sampled points is referred to herein as a sampling interval 104. The sampled attributes, along with surface orientation (i.e. a surface normal), are used to compute a rendered color 108 for each pixel 102. The rendered colors 108 of the pixels 102 preferably represent what a perspective viewer 106 would see from a particular distance and orientation relative to the graphical object 100.
As mentioned, the attributes collected by sampling the graphical object 100 are used to compute the rendered color 108 for each pixel 102. The rendered color 108 differs from the object color due to shading, lighting, and other effects that change what is seen from the perspective of the viewer 106. The rendered color 108 may also be constrained by the selected rendering device. The rendered color may be represented by a set of numbers 110 designating the intensity of each of the component colors of the selected rendering device, such as red, green, and blue on a video display or cyan, magenta, yellow, and black on an inkjet printer.
As the graphical object 100 is rendered with each frame, the positioning and spacing of the discreet sampling points (i.e., the pixels 102) projected onto the graphical object 100 determine what is seen by the perspective viewer 106. One method of rendering, referred to as ray tracing, involves determining the position of the discreet sampling points by extending a grid 111 of rays 112 from a focal point 114 to find the closest primitive each ray intersects. Since the rays 112 are diverging, the spacing between the rays 112, and therefore the size of the grid 111, increases with increasing distance. Ray tracing, while precise and accurate, is generally not used in real-time rendering systems due to the computational complexity of currently available ray tracing algorithms.
The grid 111, depicted in FIG. 1a, is a set of regularly spaced points corresponding to the pixels 102. The points of the grid 111 lie in an image plane perpendicular to a ray axis 115. The distance of each pixel 102 from a reference plane perpendicular to the ray axis 115, such as the grid 111, is known as the pixel depth or z-depth. The distance or depth of the graphical object 100 changes the level of detail seen by the perspective viewer 106. Relatively distant objects cover a smaller rendering area on the display device, resulting in a reduced number of rays 112 that reach the graphical object 100, and an increased sampling interval 104.
Visual artifacts occur when the spacing between the rays 112 result in the sampling interval 104 being too large to faithfully capture the details of the graphical object 100. A number of methods have been developed to eliminate visual artifacts related to large sampling intervals. One method, known as super-sampling, involves rendering the scene at a higher resolution than the resolution used by the output device, followed by a smoothing or averaging operation to combine multiple rendered pixels into a single output pixel.
Another method, developed to represent objects at various distances and sampling intervals faithfully, involves creating multiple models of a given object. Less detailed models are used when an object is distant, while more detailed models are used when an object is close. Texture information may also be stored at multiple resolutions. During rendering, the texture map appropriate for the distance from the viewer is utilized.
The graphical objects, and portions thereof, that are visible to a viewer are dependent upon the perspective of the viewer. Referring to FIG. 1b, a graphical scene 150 may include a variety of the graphical objects 100, some of which may be visible while others may be obstructed. Unobstructed objects are often designated as foreground objects 100a, while partially obstructed objects may be referred to as background objects 100b. Within the graphical scene 150, completely obstructed objects may be referred to as non-visible objects.
During rendering, the graphical scene 150 is converted to rendered pixels on a rendering device for observance by an actual viewer. Each rendered pixel preferably contains the rendered color 108 such that the actual viewer""s visual perception of each graphical object 100 is that of the perspective viewer 106.
A small percentage of the graphical objects 100 may be visible within a particular graphical scene. For example, the room shown within the graphical scene 150 may be one of many rooms within a database containing an entire virtual house. The rendering of non-visible objects and pixels unnecessarily consumes resources such as processing cycles, memory bandwidth, memory storage, and function specific circuitry. Since the relative relationship of graphical objects changes with differing perspectives, for example as the perspective viewer 106 walks through a virtual house, the ability to dynamically determine and prune non-visible objects and pixels improves rendering performance.
Ray casting is a method to determine visible objects and pixels within a graphical scene 150 as shown in FIG. 1a. Ray casting is one method of conducting ray tracing that advances (casts) one ray for each pixel within the graphical scene 150 from the perspective viewer 106. With each cast one or more graphical objects are tested against each ray to see if the ray has xe2x80x9ccollidedxe2x80x9d with the objectxe2x80x94an extremely processing-intensive procedure.
Z-buffering is another method that is used to determine visible pixels. Pixels are generated from each potentially visible object and stored within a z-buffer. A z-buffer typically stores a depth value and a pixel color value at a memory location corresponding to each x, y position within the graphical scene 150. A pixel color value is overwritten with a new value only if the new pixel depth is less than the depth of the currently stored pixel.
Referring to FIG. 2, a method of rendering known as post z-buffer shading and texturing defers shading and texturing operations within a rendering pipeline 200 and therefore does not texture or shade non-visible pixels. In a typical rendering system, the color of the pixels is calculated prior to z-buffering. In a post z-buffer shading and texturing system, such as the rendering pipeline 200, final color calculations are not performed until after the z-buffering operation. Deferred shading and texturing eliminates the memory lookups and processing operations associated with shading and texturing non-visible pixels and thereby facilitates increased system efficiency.
The rendering pipeline 200 includes a display memory 210 and a graphics engine 220 comprised of a triangle converter 230, a z-buffer 240, and a shading and texturing engine 250. The rendering pipeline 200 also includes a frame buffer 260. In the depicted embodiment, the display memory 210 receives and provides various object descriptors 212 that describe the graphical objects 100.
The display memory 210 preferably contains descriptions of those objects that are potentially visible in the graphical scene 150. With scene changes, the object descriptors 212 may be added or removed from the display memory 210. In some embodiments, the display memory 210 contains a database of the object descriptors 212, for example, a database describing an entire virtual house.
Some amount of simple pruning may be conducted on objects within the display memory 210, for example, by software running on a host processor. Simple pruning may be conducted so that the graphical objects that are easily identified as non-visible are omitted from the rendering process. For example, those graphical objects 100 that are completely behind the perspective viewer 106 may be omitted or removed from the display memory 210.
The graphics engine 220 retrieves the object descriptors 212 from the display memory 210 and presents them to the triangle converter 230. In the depicted embodiment, the object descriptors 212 define the vertices of a triangle or set of triangles and their associated attributes such as the object color. Typically, these attributes are interpolated across the face of the triangle to provide a set of potentially visible pixels 232.
The potentially visible pixels 232 are received by the z-buffer 240 and processed in the manner previously described to provide the visible pixels 242 to the shading and texturing engine 250. The shading and texturing engine 250 textures and/or shades the visible pixels 242 to provide rendered pixels 252 that are collected by the frame buffer 260 to provide one frame of pixels 262. The framed pixels 262 are typically sent to a display system for viewing.
One difficulty in conducting post z-buffer shading and texturing is the increased complexity required of the z-buffer. The z-buffer is required to contain additional information relevant to shading and texturing in addition to the pixel depth. The z-buffer is often a performance critical element, in that each pixel is potentially updated multiple times, requiring increased bandwidth. The increased size and bandwidth requirements on the z-buffer have limited the use of post z-buffer shading and texturing within graphical systems.
One prior art method to reduce the size of the z-buffer is shown in FIG. 3. The method divides a screen 300 into tiles 310. The tiles 310 and the screen 300 consist of a plurality of scanlines 320. Each tile 310 is rendered as if it were the entire screen 300, thus requiring a tile-sized z-buffer. While a tile-sized z-buffer requires less memory, a tile-sized z-buffer increases complexity related to sorting, storing, accessing, and rendering the object descriptors 212 within the display memory 210. The increased complexity results from objects that overlap more than one tile.
While many advances have been made to graphical rendering algorithms and architectures, including those depicted in the graphical pipeline 200, real-time rendering of photo-realistic life-like scenes requires the ability to render greater geometric detail than is sustainable on currently available graphical rendering systems.
Therefore, what is generally needed are methods and apparatus to conduct efficient graphical rendering. Specifically, what is needed is a graphical system that renders voxel primitives efficiently. The ability to render voxel objects efficiently increases the detail achievable in real-time graphical rendering systems.
What is also needed is a graphical system that renders very detailed scenes with extensive depth complexity, without tying up external memory interfaces with z-buffer data traffic. A z-buffering apparatus and method that facilitates large tiles, supports a high pixel throughput, is compact enough to reside entirely on-chip, and reduces external memory bandwidth requirements would facilitate such a system.
In addition to better z-buffering, a method and apparatus are needed that reduce the bandwidth load on the z-buffer. Specifically, what is needed is a method and apparatus that reduces the generation of non-visible pixels prior to z-buffering.
In addition to more intelligent pixel generation, rendering highly realistic scenes requires accessing large amounts of texture and world description data. Specifically, what is needed is an apparatus and method to maximize the efficiency of internal and external memory accesses. Such a method and apparatus would preferably achieve increased realism by facilitating larger stores of texture data within low-cost external memories, while maintaining a high data throughput within the rendering pipeline.
Lastly, what is needed is a graphical processing architecture that facilitates combining the various elements of the present invention into an efficient rendering pipeline that is scalable in performance.
The apparatus of the present invention has been developed in response to the present state of the art, and in particular, in response to the problems and needs in the art that have not yet been fully solved by currently available graphical rendering systems and methods. Accordingly, it is an overall object of the present invention to provide an improved method and apparatus for graphic rendering that overcomes many or all of the above-discussed shortcomings in the art.
To achieve the foregoing objects, and in accordance with the invention as embodied and broadly described herein in the preferred embodiments, an apparatus and method for improved graphical rendering is described. The apparatus and method facilitate increased rendering realism by supporting greater geometric detail, efficient voxel rendering, larger amounts of usable texture data, higher pixel resolutions including super-sampled resolutions, increased frame rates, and the like.
In a first aspect of the invention, a method and apparatus for casting ray bundles is described that casts entire bundles of rays relatively large distances. The ray bundles are subdivided into smaller bundles and casting distances as the rays and bundles approach a graphical object. Each bundle advances in response to a single test that is conducted against a proximity mask corresponding to a particular proximity. Sharing a single proximity test among all the rays within a bundle greatly reduces the processing burden associated with ray tracing. Individual rays are generated when a ray bundle is within close proximity to the object being rendered. The method and apparatus for casting ray bundles efficiently calculates the first ray intersections with an object and is particularly useful for voxel objects.
In a second aspect of the invention, a method and apparatus for gated pixelization (i.e., selective pixel generation) is described that conducts z-buffering at a coarse depth resolution using minimum and maximum depths for a pixel set. In one embodiment, the method and apparatus for gated pixelization maximizes the utility of reduced depth resolution by shifting the range of depths stored within the z-buffer in coordination with the depth of the primitives being processed. The method and apparatus for gated pixelization also reduces the bandwidth and storage burden on the z-buffer and increases the throughput of the pixel generators.
In a third aspect of the invention, a method and apparatus for z-buffering pixels is described that stores and sorts the pixels from an area of the screen, such as a tile, into relatively small regions, each of which is processed to determine the visible pixels in each region. The method and apparatus facilitates high throughput z-buffering, efficient storage of pixel auxiliary data, as well as deferred pixel shading and texturing.
In a fourth aspect of the invention, an apparatus and method for sorting memory accesses related to graphical objects is described that increases the locality of memory references and thereby increases memory throughput. In the presently preferred embodiment, access requests for a region of the screen are sorted and stored according to address, then accessed page by page to minimize the number of page loads that occur. Minimizing page loads maximizes the utilization of available bandwidth of graphical memory interfaces.
The various aspects of the invention are combined in a pipelined graphics engine designed as a core of a graphics subsystem. In the presently preferred embodiment, graphical rendering is tile-based and the pipelined graphics engine is configured to efficiently conduct tile-base rendering.
The graphics engine includes a set of pixel generators that operate in conjunction with one or more occlusion detectors. The pixel generators include voxel ray tracers, which use the method and apparatus for casting ray bundles to greatly reduce the number of computations required to determine visible voxels. In the preferred embodiment, the voxel objects are stored and processed in a compressed format.
The voxel ray tracers generate pixels from voxel objects by calculating ray collisions for the voxel objects being rendered. Proximity masks are preferably generated previous to pixel generation. Each proximity mask indicates the voxel locations that are within a certain distance of a nontransparent voxel. The proximity masks are brought in from external memory and cached as needed during the rendering process. An address that references the color of the particular voxel impinged upon by each ray is also calculated and stored within a pixel descriptor.
The voxel ray tracers conduct ray bundle casting to efficiently determine any first ray intersections with a particular voxel object. The voxel ray tracers are preferably configured to conduct perspective ray tracing where the rays diverge with each cast.
Ray tracing commences by initializing the direction of the rays in the voxel object""s coordinate system, based on the voxel object""s orientation in world space and the location of the viewer. The casting direction of each ray bundled is represented by a single directional vector. A bundle width and height corresponding to a screen region represent the bundle size. In the preferred embodiment, a top level bundle may comprise 100 or more rays.
Each ray bundle is advanced by casting the bundle in the direction specified by the directional vector a selected casting distance. A proximity mask is selected for testing that preferably indicates a proximity to the object surface that corresponds with the selected casting distance. The single test against the properly selected proximity mask ensures that none of the rays in a bundle could have intersected the object between the last test and the current test.
A positive proximity test indicates that at least one ray is within a certain distance of the object surface. In response to a positive proximity test, the ray bundle is preferably subdivided into smaller bundles that are individually advanced, tested, and subdivided until each bundle is an individual ray. The individual rays are also advanced and tested against a collision mask that indicates impingement of the ray on a non-transparent voxel of the object of interest. Upon impingement, a color lookup address for the impinged voxel is calculated, and stored along with x and y coordinates in the pixel descriptor.
The method and apparatus for casting ray bundles has several advantages and is particularly useful for voxel objects. Casting is very efficient, in that the majority of tests performed (for each ray that intersects the surface) are shared by many other rays within each bundle the ray was a member of. The proximity mask information is compact, particularly when compressed, and may be cached on-chip for increased efficiency. The algorithm is also memory friendly, in that only those portions of the object that are potentially visible need be brought onto the chip i.e. efficiency is maintained with partial view rendering. Perhaps the greatest advantage, particularly when conducted in conjunction with voxel objects, is a substantial reduction in the number of, and the bandwidth required for, geometry calculations within highly detailed scenes. The recursive subdividing nature of the algorithm also facilitates parallel execution, which in certain embodiments facilitates computing multiple ray intersections per compute cycle.
The pixel generators, such as the voxel ray tracers, generate potentially visible pixels, working in conjunction with the occlusion detector. The occlusion detector conducts depth checking at a coarse depth resolution in order to gate the pixel generators, thereby allowing the pixel generators to skip generating pixels for locations known to be occluded by a previously processed pixel. The preferred embodiment of the occlusion detector performs a parallel comparison of all the depth values within a region to a given value, and returns a mask indicating the pixel locations that are occluded at that depth. The pixel generators use the mask information to generate only pixels that are not known to be occluded. Using the occlusion detectors to conduct pixel gating reduces the overall processing and storage burden on the z-buffer.
In the preferred embodiment, the occlusion detector is used in conjunction with front-to-back rendering of the graphical primitives that comprise a scene. In certain embodiments, the occlusion detector is capable of shifting the depth range in which occlusions are detected. Depth shifting focuses the available resolution of the occlusion detector on a limited depth range. Depth shifting is preferably conducted in conjunction with depth ordered rendering. Information from the occlusion detector may also be used to gate the processing of geometric primitives.
The pixel generators and the occlusion detectors coordinate to conduct gated pixelization and provide potentially visible pixels to a sorting z-buffer. The sorting z-buffer includes a region sorter, a region memory, and a region-sized z-buffer. The region sorter sorts the potentially visible pixels according to their x, y coordinates within a screen or tile to provide sorted pixels. The sorted pixels corresponding to each region within a graphical scene or tile are received and processed by a region-sized z-buffer to provide the visible pixels.
In the preferred embodiment, the region sorter is a hardware bucket sorter. The bucket sorter operates by storing the pixels as they arrive in temporary buffers, which are transferred in parallel into the region memory when full. Additional stages of bucket sorting may be conducted by sorting pixels stored within the region memory.
Sorting the pixels into regions facilitates the use of a very small z-buffer at the core of the sorting z-buffer. The screen regions corresponding to the region-sized z-buffer are preferably smaller than the tiles typical of rendering systems. Sorting the pixels into regions also facilitates the use of larger tiles. Larger tiles reduce the number of graphic primitives that overlap more than one tile.
In one embodiment, using a region-sized z-buffer within the sorting z-buffer facilitates rendering without tiling. Using a region-sized z-buffer has the additional advantage of facilitating dynamic adjustment of the size of the tile, as well as handling more than one pixel in the z-buffer for a given location within the regionxe2x80x94a useful feature for processing semi-transparent pixels. Using a region-sized z-buffer also facilitates handling a large number of pixels per cycle. The pixels may be randomly placed within a tile and need not be stored or accessed in any particular order.
In the preferred embodiment, the bucket sorter stores the received pixels by conducting a parallel transfer to the region memory. Since the pixels may originate from the same primitive, the received pixels often have a certain amount of spatial coherence. In the preferred embodiment, the bucket sorter exploits spatial coherence by conducting a first level of bucket sorting as the pixels arrive. Additional levels of bucket sorting may be performed by recursively processing the contents of the region memory.
A further stage of the sorting z-buffer is the pixel combiner. The pixel combiner monitors the pixels provided by the sorting z-buffer. In those instances where super-sampled anti-aliasing is performed, combining is conducted on those pixels that can be combined without loss of visual quality. Combining is preferred for super-sampled pixels combined without loss of visual quality. Combining is preferred for super-sampled pixels that reference the same texture. Combining reduces the load on the colorization engine and the anti-aliasing filter.
The sorting z-buffer provides visible pixels to a colorization engine. The colorization engine colorizes the pixels to provide colorized pixels. In the present invention, colorizing may comprise any operation that affects the rendered color of a pixel. In one embodiment, the colorizing of pixels includes shading, texturing, normal perturbation (i.e. bump mapping), as well as environmental reflectance mapping. Colorizing only those pixels that are visible reduces the processing load on the colorization engine and reduces the bandwidth demands on external texture memory.
The colorization engine colorizes pixels using a set of pixel colorizers, an attribute request sorter, and a set of attribute request queues. The graphics engine may also include or be connected to a pixel attribute memory containing pixel attributes that are accessed by the pixel colorizers in conjunction with colorization. Voxel color data is preferably stored in a packed array so that only nontransparent voxels on the surface of an object need be stored. Surface normal information is also stored along with the color.
The attribute request sorter routes and directs the attribute requests relevant to pixel colorization to the various attribute request queues. In one embodiment, the attribute request sorter sorts the attribute requests according to the memory page in which the requested attribute is stored, and the attribute request sorter routes the sorted requests to the pixel attribute memory.
Sorting the attribute requests increases the performance and/or facilitates the use of lower cost storage by increasing the locality of memory references. In one embodiment, increasing the locality of memory references facilitates using greater quantities of slower, less costly dynamic random access memory (DRAM) within a memory subsystem while maintaining equivalent data throughput.
In the preferred embodiment, the last portion in the pipeline is the anti-aliasing filter. In those instances where super-sampling is performed, multiple super-sampled pixels are combined to provide rendered pixels. The rendered pixels are stored in the frame buffer and used to provide a high quality graphical rendering.
The various elements of the graphics engine work together to accomplish high performance, highly detailed rendering using reduced system resources. Pixel descriptors are judiciously generated in the pixelizers by conducting gated pixelization. Each pixel descriptor, though grouped with other pixels of the same screen region, flows independently through the various pipeline stages. Within each pipeline stage, the number of processing units operating in parallel is preferably scalable in that each pixel is directed to an available processing unit.