Computer processing of 3D graphics can be considered as a three stage pipeline, comprising the general steps of tesselation, geometry and rendering. Tesselation involves the creation of a description of an object, and converting the description to a set of triangles. The geometry stage involves transformation, that is the scaling and rotation of the triangles, and lighting, that is the determination of the brightness, shading and texture characteristics of each triangle. The rendering stage involves the calculation of all attributes of the pixels forming the triangles, e.g. color, light, depth and texture. The rendering stage creates a two dimensional display from the triangles created in the geometry stage.
Background information concerning these functions can be obtained in the articles "PC Graphics Reach New Level: 3D", by Yong Yao, published in Microprocessor Report, Vol 10, No 1, Jan. 22, 1996, "Talisman: Commodity Realtime 3D Graphics for PC", by Jay Torborg and Jim Kajiya, Sigraph 1996, as well as at pages 866-873 of the text "Fundamentals of Interactive Computer Graphics", by J. D. Foley and A. Van Dam, Addison-Wesley Publishing Company, copyright 1982.
Theoretically, 3D graphics can be controlled entirely by software. However even a central processor (CPU) with performance 10 times faster that current fast CPUs (e.g. the Pentium 200) may not be fast enough for professional level 3D graphics or high quality game programs, as will be shown below. For that reason, a 3D graphics accelerator (hardware) is desirable to speed up 3D graphics processing.
In the rendering stage, data is processed pixel by pixel. It is required that a rendering engine must calculate all attributes of the pixel, color, light depth and texture, and therefore must be able to process millions of polygons per scene and construct a quality 2D representation in real time, for animation.
A typical geometry stage is done using the vertex of each triangle, and requires 100 floating point operations per second (FLOPS) per vertex. For two million vertices, this means that 200 million FLOPS are required per second. This is twice as fast as a current low cost CPU can perform, and one quarter of the maximum claimed speed of recent media processors (800 million FLOPS per second). However as CPU speed appears to be doubling each year, it is considered that it is not currently necessary to increase the speed of the geometry stage by special hardware.
The required memory bandwidth for a 1024.times.768 24-bit display at a 75 Hz refresh rate, with a Z-buffer (a buffer for data describing the third dimension Z) and using texture mapping, is more than 4 Gbytes (4,000 million bytes). Maximum required memory bandwidth is estimated to be 12,000 million bytes per second. This is far in excess of current available bandwidth, estimated to be 500 million bytes per second. Even with doubling of memory bandwidth every three years, it is clear that memory bandwidth represents a significant processing bottleneck.
Thus the geometry stage is often contained in a 3D graphics accelerator.
Synchronous dynamic random access memories (SDRAMs) have been found to be very efficiently accessed, if addresses belong to the same row; only one clock cycle per access is required. When a following address access is in a different row, a delay is incurred for the accessing circuit to switch to the new row, which delay is called page fault (pageFault). In current SDRAM technology, the page fault cost is 7-10 clock cycles (e.g. pageFault=7).
Reference is now made to FIG. 1, which illustrates pertinent parts of a computer used to carry out 3D graphics processing. Illustrated is a processor (CPU) 1, operating in accordance with programs stored in random access memory (RAM) 3, communicating with each other via a bus 5. The CPU also communicates with a graphics subsystem 7, to which a 3D accelerator 9 is connected. Video RAM 11 is connected to the 3D accelerator, or to the graphics subsystem for storage of processed and to-be-processed display data as will be described below. The graphics subsystem and 3D accelerator are connected to a digital to analog (D/A) converter 13 which converts 3D digital signals to be displayed in 2D received from the accelerator, or other display data received from the graphics subsystem, to analog signals such as the well known standard RGB signals, which are applied to display 15.
Reference is no made to FIG. 2, which illustrates a triangle which is prepared for rendering. The three dimensional coordinates x,y and z of each of the vertices A, B and C of the triangle as well as other attributes as will be described below are stored in RAM 11.
FIG. 3 illustrates the steps of a method of rendering in accordance with the prior art. Firstly the triangle parameters are calculated in a triangle setup stage: slope of the color, texture coordinates depth and light.
The next stage, referred to as edge walking, is considering pixel parameters along the left and right edges of the triangle, by calculating the starting and ending values for a current scan line, and the span attributes. Span is defined by a starting x coordinate value, an ending x coordinate value, starting and ending values of color, light, depth and texture. This is followed by incrementing in the y coordinate direction and repeating the above steps. After pixel calculation, the data is stored in a memory such as VRAM, and is passed through the D/A converter to the display.
As noted earlier, each step in the y direction incurs a pageFault delay time cost. In the prior art, drawing each span begins with a page fault. For an n-pixel-span triangle, the extra penalty for page fault is n*(pageFault), which is typically n*7. For example for a 50 pixel triangle, with a 10 pixel base and 10 pixel height, the page fault time penalty is 10*7=70 clock cycles. If one memory access contains 4 pixels (2 bytes per pixel, a memory width of 2*4=8 bytes), for all 50 pixels calculated without page fault, 50/4=12.5 clock cycles are required. However with page fault, the penalty is 5 times larger than that without considering page fault.
One proposed solution to decrease page fault is to organize a memory using tiling, i.e. the memory is organized in pages, as shown in FIG. 4A. In the example shown in FIG. 4A, the data description of a triangle 15 is contained entirely within a memory page 17. Thus, only a single page fault is incurred, which is the advantage of tiling.
However, if as shown in FIG. 4B the triangle crosses the page boundary, it has been found that there is an enormously high page fault, e.g. 70 clock cycles for a 50 pixel triangle, which is highly undesirable.