The present application relates to computer graphics and animation systems, and particularly to 3D graphics rendering hardware. Background of the art and a summary of the innovative system and method of the present application is described below. Some of the distinctions of the presently preferred embodiment are particularly noted beginning on page 8, 9.
Modern computer systems normally manipulate graphical objects as high-level entities. For example, a solid body may be described as a collection of triangles with specified vertices, or a straight line segment may be described by listing its two endpoints with three-dimensional or two-dimensional coordinates. Such high-level descriptions are a necessary basis for high-level geometric manipulations, and also have the advantage of providing a compact format which does not consume memory space unnecessarily.
Such higher-level representations are very convenient for performing the many required computations. For example, ray-tracing or other lighting calculations may be performed, and a projective transformation can be used to reduce a three-dimensional scene to its two-dimensional appearance from a given viewpoint. However, when an image containing graphical objects is to be displayed, a very low-level description is needed. For example, in a conventional CRT display, a xe2x80x9cflying spotxe2x80x9d is moved across the screen (one line at a time), and the beam from each of three electron guns is switched to a desired level of intensity as the flying spot passes each pixel location. Thus at some point the image model must be translated into a data set which can be used by a conventional display. This operation is known as xe2x80x9crendering.xe2x80x9d
The graphics-processing system typically interfaces to the display controller through a xe2x80x9cframe storexe2x80x9d or xe2x80x9cframe bufferxe2x80x9d of special two-port memory, which can be written to randomly by the graphics processing system, but also provides the synchronous data output needed by the video output driver. (Digital-to-analog conversion is also provided after the frame buffer.) Such a frame buffer is usually implemented using VRAM memory chips (or sometimes with DRAM and special DRAM controllers). This interface relieves the graphics-processing system of most of the burden of synchronization for video output. Nevertheless, the amounts of data which must be moved around are very sizable, and the computational and data-transfer burden of placing the correct data into the frame buffer can still be very large.
Even if the computational operations required are quite simple, they must be performed repeatedly on a large number of datapoints. For example, in a typical 1995 high-end configuration, a display of 1280xc3x971024 elements may need to be refreshed at 72 Hz, with a color resolution of 24 bits per pixel. If blending is desired, additional bits (e.g. another 8 bits per pixel) will be required to store an xe2x80x9calphaxe2x80x9d or transparency value for each pixel. This implies manipulation of more than 3 billion bits per second, without allowing for any of the actual computations being performed. Thus it may be seen that this is an environment with unique data manipulation requirements.
If the display is unchanging, no demand is placed on the rendering operations. However, some common operations (such as zooming or rotation) will require every object in the image space to be re-rendered. Slow rendering will make the rotation or zoom appear jerky. This is highly undesirable. Thus efficient rendering is an essential step in translating an image representation into the correct pixel values. This is particularly true in animation applications, where newly rendered updates to a computer graphics display must be generated at regular intervals.
The rendering requirements of three-dimensional graphics are particularly heavy. One reason for this is that, even after the three-dimensional model has been translated to a two-dimensional model, some computational tasks may be bequeathed to the rendering process. (For example, color values will need to be interpolated across a triangle or other primitive.) These computational tasks tend to burden the rendering process. Another reason is that since three-dimensional graphics are much more lifelike, users are more likely to demand a fully rendered image. (By contrast, in the two-dimensional images created e.g. by a GUI or simple game, users will learn not to expect all areas of the scene to be active or filled with information.)
FIG. 1A is a very high-level view of other processes performed in a 3D graphics computer system. A three dimensional image which is defined in some fixed 3D coordinate system (a xe2x80x9cworldxe2x80x9d coordinate system) is transformed into a viewing volume (determined by a view position and direction), and the parts of the image which fall outside the viewing volume are discarded. The visible portion of the image volume is then projected onto a viewing plane, in accordance with the familiar rules of perspective. This produces a two-dimensional image, which is now mapped into device coordinates. It is important to understand that all of these operations occur prior to the operations performed by the rendering subsystem of the present invention. FIG. 1B is an expanded version of FIG. 1A, and shows the flow of operations defined by the OpenGL standard.
A vast amount of engineering effort has been invested in computer graphics systems, and this area is one of increasing activity and demands. Numerous books have discussed the requirements of this area; see, e.g., ADVANCES IN COMPUTER GRAPHICS (ed. Enderle 1990-); Chellappa and Sawchuk, DIGITAL IMAGE PROCESSING AND ANALYSIS (1985); COMPUTER GRAPHICS HARDWARE (ed. Reghbati and Lee 1988); COMPUTER GRAPHICS: IMAGE SYNTHESIS (ed. Joy et al.); Foley et al., FUNDAMENTALS OF INTERACTIVE COMPUTER GRAPHICS (2.ed. 1984); Foley, COMPUTER GRAPHICS PRINCIPLES and PRACTICE (2.ed. 1990); Foley, INTRODUCTION TO COMPUTER GRAPHICS (1994); Giloi, Interactive Computer Graphics (1978); Hearn and Baker, COMPUTER GRAPHICS (2.ed. 1994); Hill, COMPUTER GRAPHICS (1990); Latham, DICTIONARY OF COMPUTER GRAPHICS (1991); Magnenat-Thalma, IMAGE SYNTHESIS THEORY and PRACTICE (1988); Newman and Sproull, PRINCIPLES OF INTERACTIVE COMPUTER GRAPHICS (2.ed. 1979); PICTURE ENGINEERING (ed. Fu and Kunii 1982); PICTURE POCESSING and DIGITAL FILTERING (2.ed. Huang 1979); Prosise, HOW COMPUTER GRAPHICS WORK (1994); Rimmer, BIT MAPPED GRAPHICS (2.ed. 1993); Salmon, COMPUTER GRAPHICS SYSTEMS and CONCEPTS (1987); Schachter, COMPUTER IMAGE GENERATION (1990); Watt, THREE-DIMENSIONAL COMPUTER GRAPHICS (2.ed. 1994); Scott Whitman, MULTIPROCESSOR METHODS FOR COMPUTER GRAPHICS RENDERING; the SIGGRAPH PROCEEDINGS for the years 1980-1994; and the IEEE Computer Graphics and Applications magazine for the years 1990-1994; all of which are hereby incorporated by reference.
Background: Graphics Animation
In many areas of computer graphics a succession of slowly changing pictures are displayed rapidly one after the other, to give the impression of smooth movement, in much the same way as for cartoon animation. In general the higher the speed of the animation, the smoother (and better) the result.
When an application is generating animation images, it is normally necessary not only to draw each picture into the frame buffer, but also to first clear down the frame buffer, and to clear down auxiliary buffers such as depth (Z) buffers, stencil buffers, alpha buffers and others. A good treatment of the general principles may be found in Computer Graphics: Principles and Practice, James D. Foley et al., Reading Mass.: Addison-Wesley. A specific description of the various auxiliary buffers may be found in The OpenGL Graphics System: A Specification (Version 1.0), Mark Segal and Kurt Akeley, SGI.
In most applications the value written, when clearing any given buffer, is the same at every pixel location, though different values may be used in different auxiliary buffers. Thus the frame buffer is often cleared to the value which corresponds to black, while the depth (Z) buffer is typically cleared to a value corresponding to infinity.
The time taken to clear down the buffers is often a significant portion of the total time taken to draw a frame, so it is important to minimize it.
Background: Parallelism in Graphics Processing
Due to the large number of at least partially independent operations which are performed in rendering, many proposals have been made to use some form of parallel architecture for graphics (and particularly for rendering). See, for example, the special issue of Computer Graphics on parallel rendering (September 1994). Other approaches may be found in earlier patent filings by the assignee of the present application and its predecessors, e.g. U.S. Pat. No. 5,195,186, and published PCT applications PCT/GB90/00987, PCT/GB90/01209, PCT/GB90/01210, PCT/GB90/01212, PCT/GB90/01213, PCT/GB90/01214, PCT/GB90/01215, and PCT/GB90/01216, all of which are hereby incorporated by reference.
Background: Pipelined Processing Generally
There are several general approaches to parallel processing. One of the basic approaches to achieving parallelism in computer processing is a technique known as pipelining. In this technique the individual processors are, in effect, connected in series in an assembly-line configuration: one processor performs a first set of operations on one chunk of data, and then passes that chunk along to another processor which performs a second set of operations, while at the same time the first processor performs the first set operations again on another chunk of data. Such architectures are generally discussed in Kogge, THE ARCHITECTURE OF PIPELINED COMPUTERS (1981), which is hereby incorporated by reference.
Background: The OpenGL(trademark) Standard
The xe2x80x9cOpenGLxe2x80x9d standard is a very important software standard for graphics applications. In any computer system which supports this standard, the operating system(s) and application software programs can make calls according to the OpenGL standards, without knowing exactly what the hardware configuration of the system is.
The OpenGL standard provides a complete library of low-level graphics manipulation commands, which can be used to implement three-dimensional graphics operations. This standard was originally based on the proprietary standards of Silicon Graphics, Inc., but was later transformed into an open standard. It is now becoming extremely important, not only in high-end graphics-intensive workstations, but also in high-end PCs. OpenGL is supported by Windows NT(trademark), which makes it accessible to many PC applications.
The OpenGL specification provides some constraints on the sequence of operations. For instance, the color DDA operations must be performed before the texturing operations, which must be performed before the alpha operations. (A xe2x80x9cDDAxe2x80x9d or digital differential analyzer, is a conventional piece of hardware used to produce linear gradation of
Other graphics interfaces (or xe2x80x9cAPIsxe2x80x9d), such as PHIGS or XGL, superset of most of these.
The OpenGL standard is described in the OPENGL PROGRAMMING GUIDE (1993), the OPENGL REFERENCE MANUAL (1993), and a book by Segal and Akeley (of SGI) entitled THE OPENGL GRAPHICS SYSTEM: A SPECIFICATION (Version 1.0), all of which are hereby incorporated by reference.
FIG. 1B is an expanded version of FIG. 1A, and shows the flow of operations defined by the OpenGL standard. Note that the most basic model is carried in terms of vertices, and these vertices are then assembled into primitives (such as triangles, lines, etc.). After all manipulation of the primitives has been completed, the rendering operations will translate each primitive into a set of xe2x80x9cfragments.xe2x80x9d (A fragment is the portion of a primitive which affects a single pixel.) Again, it should be noted that all operations above the block marked xe2x80x9cRasterizationxe2x80x9d would be performed by a host processor, or possibly by a xe2x80x9cgeometry enginexe2x80x9d (i.e. a dedicated processor which performs rapid matrix multiplies and related data manipulations), but would normally not be performed by a dedicated rendering processor such as that of the presently preferred embodiment.
One disadvantage of standards such as OpenGL is that they require that texturing or other processor-intensive operations be performed on data before pixel elimination tests, e.g. depth testing, is performed, which wastes processor time by performing costly texturing calculations on pixels which will be eliminated later in the pipeline. When the OpenGL specification is not required or when the current OpenGl state vector cannot eliminate pixels as a result of the alpha test, however, it would be much more efficient to eliminate as many pixels as possible before doing these calculations. The present application discloses a method and device for reordering the processing steps in the rendering pipeline to either accommodate order-specific specifications such as OpenGL, or to provide for an optimized throughput by only performing processor-intensive operations on pixels which will actually be displayed.
Background: Texturing
Texture patterns are commonly used as a way to apply realistic visual detail at the sub-polygon level. See Foley et al., COMPUTER GRAPHICS: PRINCIPLES AND PRACTICE (2.ed. 1990, corr. 1995), especially at pages 741-744; Paul S. Heckbert, xe2x80x9cFundamentals of Texture Mapping and Image Warping,xe2x80x9d Thesis submitted to Dept. of EE and Computer Science, University of California, Berkeley, Jun. 17, 1994; Heckbert, xe2x80x9cSurvey of Computer Graphics,xe2x80x9d IEEE Computer Graphics, November 1986, pp.56ff; all of which are hereby incorporated by reference. Since the surfaces are transformed (by the host or geometry engine) to produce a 2D view, the textures will need to be similarly transformed by a linear transform (normally projective or xe2x80x9caffinexe2x80x9d). (In conventional terminology, the coordinates of the object surface, i.e. the primitive being rendered, are referred to as an (s,t) coordinate space, and the map of the stored texture is referred to a (u,v) coordinate space.) The transformation in the resulting mapping means that a horizontal line in the (x,y) display space is very likely to correspond to a slanted line in the (u,v) space of the texture map, and hence many page breaks will occur, due to the texturing operation, as rendering walks along a horizontal line of pixels.
Innovative System and Methods
Particularly for low-end users, the cost of graphics hardware is important. One variable expense of graphics hardware is the cost of dedicated DRAM, VRAM, and/or other memory. Higher resolutions and performance generally require more dedicated memory. The user must therefore balance the utility of higher resolution, faster performance, and lower memory cost when selecting hardware. The present application provides for a system which allows high resolutions while requiring less memory, with a slight performance cost.
The presently preferred embodiment provides for a graphics rendering system and method utilizing a unified memory space in place of the normally separate local and frame buffers. It is possible to operate this memory by simply defining a partition between local memory and frame memory. However, the disclosed circuit includes capability for multiplexing depth and color information into the same address space (i.e. the depth-buffer and the back framebuffer can be multiplexed together). To provide this capability, there is a command for stuffing the alpha MSB (for depth/color buffer tagging).
High-Resolution Rendering
This innovative rendering system allows (along with many other capabilities) a slower high-resolution rendering procedure, which permits more resolution to be achieved (for a given local buffer size) than would otherwise be possible. Alternatively, this procedure can be used to reduce memory requirements, e.g. to allow more or bigger texture maps.
In one class of embodiments, each primitive is rendered once to provide depth values (and thus determine which pixels of each primitive are displayed), and a second time to store the color values of pixels which are to be displayed. A specified bit of the color values is used to ensure that depth data and color data, which coexist in a common memory space, cannot be confused or interchanged.
By combining the memory space of the depth buffer and back framebuffer, the user is saved the cost of individual memory spaces for each function, yet retains the high quality output of higher resolution, depth buffered graphics. To accomplish this, however, each primitive must go through a two-pass rendering process, which reduces the rendering throughput.
The first rendering pass computes the depth values for each pixel of each primitive, and stores the depth value in the combined buffer only if it is less than the value currently in the buffer, i.e. the pixel is xe2x80x9con top ofxe2x80x9d the previous pixel, and ignores the color data. The second rendering pass recomputes the depth and color values of each pixel of each primitive, and compares each depth value with the depth value for that pixel stored in the combined buffer. If the values are equal, indicating that the current pixel is actually to be displayed, the color data of the pixel is written to the buffer, replacing the depth value. At the same time, one bit of the color data, the most significant alpha bit in the current embodiment, is forced high. Because depth data never has this bit high, this bit effectively functions as a flag bit to ensure that color data is always distinguished from depth data.