As processing power has increased, the complexity of 3-dimensional computer generated images has also increased. Computer models for very complicated 3D objects, like human movements using vertices and triangle meshes have become easier to generate. This kind of 3D model can be sent to a 3D computer graphics system and animated 3D images generated on a computer screen. Computer generated 3D animated images are widely used in 3D computer games, navigation tools and computer aided engineering design tools.
3D computer graphics systems have to cope with demands for more complex graphics and faster speed display. As the details in the display model increase, more and more primitives and vertices are used. Also, as the texture and shading techniques evolve, more and more information is attached to primitive and vertex data. In modern games there may be over a million primitives in a render. Therefore, memory bandwidth is a very big factor affecting the performance of computer graphics systems.
Tile based rendering systems are well-known. These subdivide an image into a plurality of rectangular blocks or tiles. As in British Patent No. GB2343603 and International Patent Application No. WO 2004/086309 these systems divide a render surface into sub surfaces in n×m pixel tiles, a primitive like triangle, line or point is only to be processed in the tiles which overlap with the primitive.
The main steps performed for tiling in a tile based 3D computer graphics system are shown in FIG. 1. These are as follows:
1. An input data stream of primitives and vertices is received at 101 in FIG. 1. Primitives in similar locations may arrive sequentially in time, such as triangles strips and fans.
2. A Macro Tiling Engine (MTE) transforms the vertices into screen space, removes primitives which may be back facing or clipped by a clipping plane using well-known methods. The primitives are then grouped into primitive blocks with a fixed maximum number of vertices and primitives, and are written into memory at 102 in FIG. 1. The number of vertices and primitives together with the memory addresses of the primitive blocks (primitive block pointers) are sent to a Tiling Engine at 103 to be added to a control stream for a display list for the tiles which are covered by the primitives.
To minimize the impact of memory bandwidth to fetch primitive and vertex data, primitives are grouped into primitive blocks in macro tiles according to a bounding box for the primitive block. In FIG. 2 a macro tile 201 is a rectangular area of the screen 200 with a fixed number of tiles 202. A macro tile for example can be a quarter or 16th of the screen size. This structure is used to localize primitive blocks and reduce memory bandwidth. When primitives in a primitive block cross macro tiles, they are written to a special macro tile list called a global macro tile list. In this way the parameter data for a primitive block are only written once. Primitives from a macro tile can only be accessed by the tiles inside the macro tile, while primitives from the global macro tile list can be accessed by all the tiles.
3. Each primitive from a primitive block which is written by Macro Tiling Engine 102, is checked against each tile inside the bounding box of the primitive block. The primitive block is added to the display list for any tile which is covered by any primitives in the primitive block. The control data written in the control stream associated with the display list of the tile includes a primitive block header for the number of vertices and primitives in the primitive block, a primitive block pointer for the memory address of the primitive block written to, and a primitive mask for the primitives which are visible in the tile.
Separate memory spaces are allocated to each tile for the control stream data in the display list. A memory address pointer called a tall pointer is used for the next free address in the control stream data of each tile.
To improve memory access for the control streams in tiles, a small cache Tail Pointer Cache can be added to the Tiling Engine 103. The memory location of the end of the control stream in a tile is stored and read from the Tail Pointer Cache, which reduces main memory accesses from the Tiling Engine.
4. 3D image processing in a the based 3D computer graphics system is performed at 104 for each tile of the screen from a region array 300 of FIG. 3. It traverses through the control stream of each tile 301 in FIG. 3, reads the vertex and primitive data from memory addresses pointed to by the primitive block pointer in the control data 302. Image processing operations like hidden surface removing, texturing and shading are performed on all the primitives valid in the tile from the primitive block.
An example of tile based render is shown in FIG. 6. A macro tile MT0 600 which is part of the screen has 16 tiles 601 inside. Two triangle strips 602, 603 and a big triangle 604 are processed by Macro Tile Engine MTE 605 and projected into part of the screen in MT0. The vertex and primitive data associated with the three primitive blocks are written to memory 607 with their own memory addresses. The memory address pointers of the three primitive blocks are then passed to Tiling Engine TE 606 for tiling processing. All the tiles are traversed by Tiling Engine to decide if any primitives are inside the tile and control stream data associated to the primitive block which is visible in the tile are written to memory for the tile display list. In the example display control stream in T5 will include control data for the address pointers of the three primitive blocks and triangle visible mask of the triangles within the three primitive blocks. For example the first three triangles from left in primitive block 602 and the first triangle from right of primitive block 603 are visible in T5 together with triangle 604. For tile T9 the control stream data will only consists of the information from primitive block 603 and 604. Meanwhile the control data such as address pointer of the vertex data associated to primitive block 602 are inserted into two control streams for tile T5 and T6, while control data associated with primitive block 604 are inserted into control streams for every tile in the macro tile MT0.
In 3D render processing 608 all the tiles are processed one by one in the order of region array as 300 in FIG. 3. For each tile in 3D processing control data from the tile control stream are read from memory first, as 301 in FIG. 3. Vertex and primitive data associated to the primitive blocks in the tile control data are read from memory as 302 in FIG. 3 therefore all the triangles which are visible to the tile are processed and rendered to screen.
For tile based computer graphics system a render is performed on a tile by tile basis. The big advantage of tile based rendering is that it significantly reduces the requirement for large graphics system internal storage and memory bandwidth.
The disadvantage of tile based rendering is that increased control data is needed for the display list in each tile. Display control data needs to be written for all the tiles which a triangle covers. For big triangles which cover many tiles the total amount of control data written is significant.
For example a render with screen size 1920×1080 can be divided into 8160 tiles of 16×16 pixels. A big triangle which covers the whole screen has to be added in the display lists of all the tiles. If there are two 32 bit words needed for the control data, then the total control stream data is nearly 64 KB for the single full screen triangle. The amount of control data needed in this case has a significant impact on the memory space requirement and memory bandwidth, therefore it affects the performance of the render in a tile based 3D computer graphics system.
In UK Patent Application No. 0717781.9, a system is described which processes two tiles at the same time in order to improve the performance of a Tiling Engine. The control stream data from the two adjacent tiles are combined into a single display list, which reduces the total control stream data to around 32 KB in the above example. The size of the control stream data is about half the size in this case, but it is still a significant amount of control data for a single triangle.