In the applicant's UK Patent No. 2281682, there is described a 3-D rendering system for polygons in which each primitive object in the 3D scene is seen to be viewed as defined in a set of surfaces which are infinite. Each elementary area of the screen (e.g. pixel) in which an image is to be displayed has a ray projected through it from a viewpoint into the 3-D scene. The location of the intersection of the projected ray with each surface is then determined. From these intersections, it is then possible to determine whether any intersected surface is visible at that elementary area. The elementary area is then shaded for display in dependence on the results of the determination.
The system can be implemented in a pipeline type processor comprising a number of cells, each of which can perform an intersection calculation with a surface. Thus, a large number of surface intersections can be computed simultaneously. Each cell is loaded with a set of coefficients defining the surface for which it is to perform the intersection test.
An improvement to this arrangement is described in the applicant's UK Patent No. 2298111. In that document, the image is divided into sub-regions or tiles and the tiles can be processed in turn. It is proposed to use a variable tile size and to project a bounding box around complete primitive objects so that only those tiles falling within the bounding box require processing. This is done by determining the distribution of primitive objects on the visible screen, in order for a suitable tile size to be selected. Each object in the tile is stored as a number of triangles having surfaces and vertices. Other shapes are also possible. The surfaces which define the various primitive objects are then stored in a list, known as the display list, thereby avoiding the need to store identical surfaces for each tile, since one object made of many surfaces could appear in a number of tiles. Object pointers which identify the primitive objects in the display list are also stored. There is one object pointer list per tile. The tiles can then be rendered in turn using the ray casting technique described above until all primitive objects within each tile are processed. This is a useful method because no effort needs to be made to render objects which are known not to be visible in a particular tile.
A further improvement on this system is proposed in the applicant's International Patent Application Nos. PCT/GB99/03707 (publication number WO 2000/028483) and PCT/GB2004/001076 (publication number WO 2004/086309) and the applicant's UK Patent No. 2343603, in which any tiles within the bounding box which are not required to display a particular primitive object are discarded before rendering.
FIG. 1 shows the type of processor 101 used in the existing systems described above. Essentially, there are three components. The tile accelerator unit (TA) 103 performs the tiling operation i.e. selects a suitable tile size and divides the visible screen into tiles, and supplies the tile information i.e. the 3-D object data for each tile, to the display list memory 105. The image synthesis processor (ISP) 107 uses the 3-D object data in the display list memory to perform the ray/surface intersection tests discussed above. This produces depth data for each elementary area of the visible screen. After this, the derived image data from the ISP 107 is supplied to texturing and shading processor (TSP) 109 which applies texturing and shading data to surfaces which have been determined as visible and outputs image and shading data to a frame buffer memory 111. Thus, the appearance of each elementary area of the display is determined so as to represent the 3-D image.
In the systems described above, a problem may arise as the complexity of the scene to be rendered increases. Complex scenes require more 3-D object data for each tile to be stored in the display list memory and this means that storage requirements increase. If the display list memory runs out of space, parts of the scene may simply not be rendered and this type of image corruption is becoming less and less acceptable.
In order to solve this problem, the applicant's International Patent Application No. PCT/GB01/02536 (publication number WO 2001/095257) proposes the idea of partial rendering. The state of the system (ISP and TSP) is stored to memory before rendering of a tile is complete, and the state is reloaded at a later time in order to finish the rendering. This process is referred to as “z/frame buffer load and store”.
The screen is divided up into a number of regions called macro-tiles, each macro-tile consisting of a rectangular region of the screen. Memory in the display list is then divided into blocks and these are listed in a free store list. Blocks from the free store are then allocated to the macro-tiles as required. The tiling operation stores data associated with each macro-tile in each block. (The tiling operation performed by the TA fills the display list memory so is sometimes referred to as Memory Allocation.) When the display list memory fills up, or reaches some predefined threshold, the system selects a macro-tile, performs a z/frame buffer load, and renders the contents of the macro-tile before saving it using a z/frame buffer store operation. Thus, depth data for the macro-tile is stored according to the data loaded into the display list so far. Upon completion of such a render, the system frees any memory blocks associated with that macro-tile, thereby making them available for further storage. (Because the rendering process frees up display list memory space, it is known as Memory De-Allocation.) So, the scene for each tile is constructed by a number of tiling operations followed by partial renders. Each partial render updates the depth data stored. This means that an upper bound on the memory consumption is imposed and also the memory bandwidth consumed by the system is minimised.
One example of a type of processor used in the partial rendering system is shown in FIG. 2. It can be seen that this is a modified version of FIG. 1. A z buffer memory 209 is linked to the ISP 207 via a z compression/decompression unit 211. This comes into operation when the system is rendering a complex scene and the display list memory 205 is not large enough to contain all the surfaces which need to be processed for a particular tile. The display list will be loaded with data by the TA 203 for all the tiles until it is substantially full (or until a predefined threshold is reached). This may, however, only represent a portion of the initial data. The image is rendered one tile at a time by ISP 207. The output data for each tile is provided to TSP 213, which uses texture data to texture the tile. At the same time, because the image data was incomplete, the result (i.e. depth data) from ISP 207 is stored to buffer memory 209 via compression/decompression unit 211 for temporary storage. The rendering of the remaining tiles then continues with the incomplete image data until all the tiles have been rendered and stored in frame buffer memory 215 and in z buffer memory 209.
The first part of the display list is then discarded and the additional image data read into it. As processing is performed for each tile in turn by ISP 207, the relevant portion of data from z buffer memory 209 is loaded via the z compression/decompression unit 211 so that it can be combined with the new image data from display list memory 205. The new depth data for each tile is then fed to TSP 213 which combines it with texture data before supplying it to the frame buffer 215.
This process continues for all the tiles in the scene and until all the image data has been rendered. Thus, it can be seen that the z buffer memory fills a temporary store which enables a smaller display list memory to be used than would be necessary for rendering particularly complex scenes. The compression/decompression unit 211 is optional but it enables a smaller z buffer memory to be used. In addition, deferred pixel processing is enabled, which eliminates any redundant pixel operations.
So, as discussed in International Patent Application No. PCT/GB01/02536, once the display list memory fills up, or reaches a certain threshold, the system selects a macro-tile to render in order to free up some display list memory. In that application, the selection of the macro-tile to render depends on a number of factors, for example the macro-tile which will release the most memory back to the free-store may be chosen.
A further improvement to this system has been provided in the applicant's UK application no. 0619327.0. A schematic example of a processor used in that arrangement is shown in FIG. 3. It can be seen that the system 301 is similar to that of FIG. 2 and includes a TA 303, ISP 311, TSP 313 and Frame Buffer 315. In this case, however, the Display List Memory 305 and the Z Buffer Memory 307 are both part of a single heap of memory, termed the Parameter Memory 309. Allocation between the Display List and the Z Buffer within the parameter memory is according to need, which makes for more efficient memory usage. FIG. 3 does not show a z compression/decompression unit, as in FIG. 2, but such a unit could be included.
In addition, in the applicant's UK application no. 0619327.0, the display list memory comprises a block for each macro-tile that requires it and also a global list. It is possible for object data to traverse more than one macro-tile. In this case, the object data is allocated to the global list, which contains only object data that is in more than one macro-tile. The display list is thus grouped into macro-tiles plus a global list. All object and control stream data within a macro-tile will only be addressed by tiles that reside within a given macro-tile and any object data residing in more than one macro-tile is in the global list and this data can be addressed from any control stream. This also makes for more efficient memory usage and access.
FIG. 4 shows a particular example of a system described in UK 0619327.0. The arrangement includes a Dynamic Parameter Manager DPM 411, the ISP 407, the Macro Tiling Engine 403, the Tiling Engine TE 405 and the memory 409. In general terms, the MTE 403 generates the primitive object data for each macro-tile and enters that in the appropriate part of the memory 409. The TE 405 uses the primitive object data from MTE 403 to generate control stream data for each tile and enters that in the memory 409. Thus, the MTE and TE together act as the Tile Accelerator so that the memory area for each tile in each macro tile points to the appropriate object data memory areas. The ISP 407 uses the object data to derive depth data and stores that in the z-buffer part of memory 409.
A more detailed description of the FIG. 4 operation will now be given with reference also to FIG. 5. FIG. 5 shows several triangle shapes to be displayed. Each triangle shape is known as a primitive, or simply a shape. The screen is divided into 30 tiles. The primitives are divided into two primitive objects. The first primitive object has vertices 501, 503, 505, 507 and 509. The second primitive object has vertices 511, 513, 515, 517 and 519. Thus, each primitive object comprises a number of primitives.
At the first step, the vertex and primitive data are input (at 401). The triangles in FIG. 5 are grouped into the two primitive objects with vertices 501 to 509 and 511 to 519. Each of the primitive objects is written into memory by MTE 403 using the DPM 411 to appropriately allocate memory space, and the memory addresses are passed to the next step of tiling.
At the second step, the MTE processes the vertices 501 to 509 and 511 to 519 into screen space, removing any vertices which are invisible as off-screen (none in this case). As already discussed, to minimise the impact of memory bandwidth to fetch the primitive and vertex data, primitives are grouped into primitive objects in macro-tiles. Each macro-tile is typically a quarter or a sixteenth of the screen size. When primitive objects cross macro-tiles, their data may be written only once to the global list. Thus, at 403, the primitive objects (having a maximum number of vertices) are written into the part of memory 409 associated with that macro-tile. The memory pages associated with each macro-tile are allocated by the DPM 411 on demand. The number of vertices and primitives, together with the memory addresses of the primitive objects (these are known as vertex pointers) are sent to the TE 405 to be added into the control stream of the tiles in which the primitive objects are visible.
At the third step, the TE 405 writes control data to each tile. Each primitive from a primitive object written at MTE 403 is checked against each tile inside the bounding box of the primitive object. Control data are written into the control stream associated with the tile if there are any primitives visible in the tile. So, for the first primitive in FIG. 5, having vertices 501 to 509, the control data associated with the primitive object are only written into some of the tiles, like C2, C3, C4 and C5 (shaded grey), not into tiles such as C1 and C6. Similarly, for the second primitive object in FIG. 5, having vertices 511 to 519, the control data are only written for some of the tiles, like B4 and B5, but not for others, like B6. The control data written at this step include a primitive object header for the number of vertices and primitives in the primitive object, and a vertex pointer for the memory address of the primitive object written to. Separate memory spaces are allocated to each tile for the control stream by the DPM 411.
Finally, at the fourth step, 3D image processing is performed by ISP 407. The 3D processing is performed for each tile in the screen. Referring to FIG. 6, which shows an overview of the data structure for tiling, the 3D processing is performed for each tile from the region array 601. The control data for each tile, comprises a number of control streams 0 to M in FIG. 6. Each control stream comprises a primitive object header and a vertex pointer and the number of control streams in the control data for that tile depends on the number of primitive objects falling within that tile. The ISP traverses through the control stream of each tile 603, reads the vertex and primitive data from memory address by the vertex pointer in the control data 605. So, referring to FIG. 5 again, some of the tiles contain no primitives, some tiles, like B2, contain primitives from only one primitive object, and other tiles, like C2 and C3, contain primitives from both primitive objects. Image processing operations like hidden surface removing, texturing and shading are performed on all the primitives from the primitive object valid in the tile.
Although the known arrangement described above does have many advantages, it does have some drawbacks. The tiling system processes a primitive object in each tile so the memory writes for control data jumps from one control stream to another. This breaks the memory burst size and reduces the efficiency of memory access.
Also, when a big primitive shape covers many of the tiles on the screen, the control data are repeated many times over. The vertex and primitive data in a particular memory block may be read many times in many tiles, during the 3-D image processing. This increases memory bandwidth. On the other hand, if only some of the vertices and primitives from a primitive object are visible in a given tile, it is not efficient to fetch the whole primitive object data.
The memory pages used in macro tiles for vertex and primitive data are allocated on demand, and are therefore not necessarily contiguous pages. To prevent the data from a particular primitive object crossing a page boundary, a new page, from the free store, has to be allocated if there is insufficient space on the current page for the whole primitive object. This creates wastage. Also, in a pipelined system, the exact size of a primitive object is not known until the entire object is actually written, so a maximum size has to be assumed at the start. This creates further wastage.
For the partial rendering, discussed initially with reference to FIG. 2, there is a similar problem in that the compression/decompression unit 211 does not know the size of the compressed z data for the tile when it starts to compress the z data and write it into memory 209. To avoid the compressed data for a tile crossing a memory page, at the start of a tile, the compression/decompression unit 211 has to check if there is enough space in the current memory page. This check has to assume a maximum size for the compressed z data, which creates wastage.
The inventors of the present invention have seen that various improvements could be made to the known systems described above.