The present invention relates to image generation in general and in particular to rendering engines for generating images that might be needed to be generated in real-time response to an input of objects in a scene.
Rendering is the process of computing a two-dimensional viewpoint dependent image of a three-dimensional geometric model. Rendering is computationally intensive, since a typical geometric model might contain objects that collectively comprise millions of polygons. In the typical rendering engine, a geometry stage first converts mathematical descriptions of arbitrarily complex objects into a collection of polygons, typically triangles, so that subsequent rendering stages only need to deal with simply polygons. The rendering stages then need to simulate the transport of light at various wavelengths emitted from numerous light sources in the geometric model and scattered by surfaces of objects in the geometric model.
Two common methods of handing light-object interactions are known as xe2x80x9cray tracingxe2x80x9d and xe2x80x9cradiosityxe2x80x9d methods. If those methods are implemented in software, the software can take from minutes to days to produce a single high quality xe2x80x9cphotorealisticxe2x80x9d image of a large geometric model. A typical photorealistic image might require a resolution of 2000 pixels by 1000 pixels and contain a million or more separately specified scene elements (i.e., objects or light sources). While days might be available for rendering some images, many applications require real-time rendering of video frames, such as computer games, flight simulators, scientific visualizers, computer-aided designcomputer-aided engineering (CAD/CAE) applications and medical data visualizers.
With real-time rendering of video frames, the delay allowed between the receipt of a geometric model and the output of the corresponding image is determined, in part, by the frame rate. In some applications, the frame rate can be as low as one frame per minute, so a viable rendering system for that application must produce at least one image per minute if it is to run at an acceptable rate. In many applications, such as computer animation or flight simulation, the rendering system is expected to run at thirty frames or more per second. In some thirty framesecond applications, the delay might need to be even less, as might be the case for a flight simulator or computer game having feedback. With such a system, an operator of the system provides inputs, such as commands to move up, down or sideways, and the corresponding changes to the geometric model are provided to a rendering engine. To avoid the feeling of xe2x80x9csea-sicknessxe2x80x9d that occurs when an image is out of sync with other stimuli, the rendering engine might have to generate an image from a geometry model in even less time than a frame period.
Existing software rendering engines are generally unable to meet such performance requirements, but some hardware rendering engines have been able to approach these performance requirements, although not always for a practical cost. Generally, in order to generate images of reasonable resolution in {fraction (1/30)}th of a second or less, a hardware approach known as a xe2x80x9cgraphics pipelinexe2x80x9d is used to achieve real-time image generation.
A graphics pipeline comprises a few well-understood stages: the geometry stage, the rendering stage and the composition stage. In a rendering engine with a graphics pipeline, some or all of these stages are implemented in parallel.
In the geometry stage, a database of geometric objects (usually triangles) is read from host memory and transformed from a world coordinate system (xe2x80x9cobject spacexe2x80x9d) into a view dependent coordinate system (xe2x80x9cscreen spacexe2x80x9d). In some cases, the geometry stage might also perform the conversion of complex objects into simplified objects comprising polygons. Because all polygons can be represented as triangles, and processing triangles is simpler than processing polygons with more than three sides, nearly all existing rendering engines limit geometric objects to triangles. With triangles, each vertex is defined by a coordinate (x, y, z) and a surface normal vector (dx, dy, dz) and might also be defined by surface material properties (such as a coefficient of reflectivity and transmissivity) and the surface may be bound to textures. The geometry stage converts the vertex coordinates and normals from object space representations into their screen space counterparts (with a resolution being a function of the resolution of the output image) and discards objects that fall entirely outside the perimeter of the screen (the xe2x80x9cview surfacexe2x80x9d). The geometry stage also performs lighting calculations, as needed, at the vertices for use by later stages that will interpolate these results at individual pixels.
Once the geometric model is represented as polygons in screen space, the rendering stage calculates the pixel color values that correspond to those polygons. Where the rendering stage is performed by a plurality of rendering engines, the composition stage combines the results from the parallel rendering engines to form the final image. With parallel processing, there are several considerations, such as load balancing and thread independence. The goal of load balancing is to ensure that the work to be done is evenly distributed over all of the parallel threads, to avoid having some threads idle while others are still processing. Thread independence is desirable, since independent threads will not be held up waiting for other threads to reach a certain point in their processing. Some thread dependence might be unavoidable, but such dependence might not always result in additional delays.
One approach to dividing up the work among a plurality of threads is known as the xe2x80x9csort-middlexe2x80x9d architecture. In a xe2x80x9csort-middlexe2x80x9d approach, the screen space is divided into tiles and each tile is assigned a bin, where the bin corresponds to the work one rendering thread will perform. The objects comprising the geometric stage""s output are then allocated to the bins, with each object only being allocated to the bins of the tiles that the object intersects. Within each bin, a Z-comparison (depth comparison) is performed to determine which object is closest to the screen within each pixel of that bin""s tile. Shading and blending calculations are then performed to compute the amount of light reflected at each wavelength (typically the three wavelengths r, g and b) by the closest object in each pixel, after textures have been taken into account. The result of this calculation is the color value (e.g., r, g and b values) of the pixel.
Another approach is the xe2x80x9csort-lastxe2x80x9d architecture wherein, instead of subdividing the screen, the objects are allocated randomly to the bins and each thread does a Z-comparison on the objects-in its bin. Each thread computes an entire screen image, albeit an incorrect one containing less than all of the objects in the geometric model. The Z (depth) information is stored along with the pixel color values.
The composition stage in a sort-middle architecture simply combines the subimages for each tile into the final image. The composition stage in a xe2x80x9csort-lastxe2x80x9d architecture uses the Z information to merge all of the part (xe2x80x9cincorrectxe2x80x9d) images to produce a single correct image. With a sort-last composition stage, a merge engine is used to combine the part images by identifying which objects in which part images overlap which objects in other part images. In some cases, where there are many parallel threads, the composition stage might be done in parallel. Sort-last architectures typically provide better load balancing than sort-middle architectures, since some sort-middle threads might be allocated tiles that contain few or no objects.
The actual process of rendering, in a single thread, using hardware is highly developed and many products are available that can quickly render objects into images. Such products often support standard command sets, such as the OpenGL API (application programming interface), a low-level programming interface that is closely related to the architecture of the graphics pipeline. This standardization makes those products convenient for development of rendering engines. Another command set that is becoming a standard is the Direct3D API developed by Microsoft Corporation. Direct3D is a higher level programming interface that has been used on computers primarily for games.
Inexpensive rendering cards for personal computers are able to render about one million triangles per second. However, if a thirty framesecond video image is needed, the most complicated model that the rendering card could process in real-time would be limited to no more than about 33,000 triangles. Some computer games might be playable with such a limited number of triangles, but generating a photorealistic image with 33,000 triangles is difficult or impossible. Many CAD/CAE applications, such as those used by auto makers require finite element models with a hundred thousand to a million elements. To render an automotive model with a hundred thousand to a million triangular elements in real-time at thirty frames per second, a rendering engine would need to process three to thirty million triangles per second. A model with five million elements, which may be realistic within a few years, would require a rendering engine that can process 150 million triangles per second.
The bandwidth required at the input to the geometry stage of the graphics pipeline can be estimated from the model size. If a rendering card can process a large number of triangles/second, it cannot be used to its full capacity unless the communication channel to that rendering card can carry all of the data specifying those large numbers of triangles.
Each triangle is represented by at least three vertices, and each vertex by six floating point numbers (a coordinate and a surface normal vector) for a total of seventy-two bytes in the typical specification of a triangle. Surface normal vectors are usually defined at each vertex when the triangle represents a facet of a tessellated surface such as a spherical object. When this is not the case, the data requirements can be reduced by defining a common surface normal vector for all vertices. Another data reduction method is to use xe2x80x9ctriangle stripsxe2x80x9d which eliminate redundant vertex definitions, and require one additional vertex for each additional triangle. Worst case, a rendering card that can process one million triangles per second would require a 72 megabytesecond (MBS) communication bandwidth. Many rendering cards for personal computers communicate over a standard peripheral bus known as the PCI bus, and 72 MBS is about the achievable capacity of the current PCI bus. A system that renders three million triangles per second requires a bandwidth of roughly 216 MBS. This bandwidth should be achievable over newer buses, such as Intel""s Accelerated Graphics Port (AGP) bus.
Even if a rendering system could render 30 to 150 million triangles/second, it would require a communication channel that could handle from two to eleven gigabytessecond (GBS), which most certainly would require parallel threads for communication. While several high-performance rendering engines that use parallel processing are available, they are often quite costly and not scalable.
The present invention overcomes several disadvantages of the prior art methods and apparatus for generating images.
In one embodiment of an image generator according to the present invention, the image generator is organized into a plurality of rendering engines, each of which renders an image of a part scene and provides the part image to a merge engine associated with that rendering engine. The image is a part image in that it usually contains less than all of the objects in the image to be rendered. The merge engine merges the part image from its associated rendering engine with the part image provided by a prior merge engine (the prior neighbor in a merge engine sequence) and provides the merged part image to a next merge engine (the next neighbor in a merge engine sequence). One or more merge engines are designated as output merge engines and these output merge engines output a merged part image that is (a portion of) the ultimate output of the image generator, the full rendered image. Each merge engine performs its merge process on the pixels it has from its rendering engine and from its prior neighbor merge engine, without necessarily waiting for all of the pixels of the part image or the merged part image.
One advantage of the image generator is that the operations of the merge engines are pipelined, since they operate on pixels as they arrive, if an operation can be done on the arrived pixels, instead of waiting for all pixels to arrive.
In one variation of the basic image generator, the merge sequence is dynamic. In one such embodiment, the merge engines are coupled to a switch network where the switch network determines which merge engines are prior neighbors and next neighbors to which merge engines. Preferably, the switch network makes those determinations based on the objects being provided to the individual rendering engines.
The image generator can be used for a variety of image compositing and xe2x80x9ctwo-and-a-half-Dxe2x80x9d applications, which combine 3D rendering, 2D overlays and special operations including color correction, dissolves, blending and other transformations.