As technologies develop rapidly, the complexity of 3-dimensional computer generated images increases. One can easily build a computer model for very complicated 3D objects, such as human movements using vertices and triangle meshes. These kinds of 3D models can then be sent into a 3D computer graphics system and animated 3D images are generated on the computer screen. Computer generated 3D animated images are widely used in 3D computer games, navigation tools and computer aided engineering design tools.
3D computer graphics system has to cope with demands for more complex graphics and faster speed display. As the details increase in the display model, more and more primitives and vertices are used. Also as the texture and shading techniques evolving, more and more information comes with primitive and vertex data. In modern games there may be over a million primitives in a render. So the memory bandwidth is a very big factor on the performance of computer graphics systems.
Tile based rendering systems are well-known. These subdivide an image into a plurality of rectangular blocks or tiles. In British Patent GB2343603 and International Patent Application number WO 2004/086309 the render surface is divided into sub surfaces in n×m pixel tiles. A primitive like triangle, line or point is only processed in tiles which overlap with the primitive.
The main steps performed for tiling in a tile based 3D computer graphics system are shown in FIG. 1.
1. Input data stream of primitives and vertices as 101 in FIG. 1. Primitives in similar locations may arrive sequentially in time, like triangle strips and fans.
2. Macro Tiling Engine (MTE) transforms the vertices into screen space, removes primitives which may be back faced or clipped by a clipping plane using well-know methods. The primitives are grouped into primitive blocks with a fixed maximum number of vertices and primitives, and are written into memory as 102 in FIG. 1. The number of vertices and primitives together with the memory addresses of the primitive blocks (primitive block pointers) are sent to a Tiling Engine to be added to the control stream of the display list for the tiles which are covered by the primitives.
To minimize the impact of memory bandwidth when fetching primitive and vertex data, primitives are grouped into primitive blocks a macro tiles depending upon a bounding box for the primitive block. As in FIG. 2 a macro tile 201 is a rectangular area of the screen 200 with a fixed number of tiles 202. A macro tile can be a quarter or 16th of the screen size, which is used to localize the primitive blocks and reduce memory bandwidth. When primitives in a primitive block cross macro tiles, they are written to a special macro tile called global macro tile. In this way the parameter data in a primitive block are only written once. Primitives from a macro tile can only be accessed by the tiles inside the macro tile, while primitives from a global macro tile can be accessed by all the tiles.
3. Tiling Engine (TE) as 103 in FIG. 1. Each primitive from a primitive block written in Macro Tiling Engine 102, is checked against each tile inside the bounding box of the primitive block. The primitive block is added to the display list of the tile which is covered by any primitives in the primitive block. The control data written in the control stream associated with the display list of the tile includes a primitive block header for the number of vertices and primitives in the primitive block, a primitive block pointer for the memory address of the primitive block written to, and a primitive mask for the primitives which are visible in the tile.
Separate memory spaces are allocated to each tile for control stream data in the display list. A memory address pointer called a tail pointer is used for the next free address in the control stream data of each tile.
To improve memory access for the control streams in tiles, a small cache Tail Pointer Cache can be added to the Tiling Engine. The memory location of the end of the control stream in a tile is stored and read from the Tail Pointer Cache, which reduces main memory accesses from the Tiling Engine.
4. 3D image processing. The 3D image processing in a tile based 3D computer graphics system is performed for each tile of the screen from a region array 300 in FIG. 3. It traverses through the control stream of each tile 301 in FIG. 3, and reads the vertex and primitive data from memory addresses pointed by the primitive block pointer in the control data 302 in FIG. 3. Image processing operations such as hidden surface removal, texturing and shading are performed on all the primitives valid in a tile from the primitive block.
An example of tile based render is shown in FIG. 8. A macro tile MT0 800 which is part of the screen has 16 tiles 801 inside. Two triangle strips 802, 803 and a big triangle 804 are processed by Macro Tile Engine MTE 805 and projected into part of the screen in MT0. The vertex and primitive data associated with the three primitive blocks are written to memory 807 with their own memory addresses. The memory address pointers of the three primitive blocks are then passed to Tiling Engine TE 806 for tiling processing. All the tiles are traversed by Tiling Engine to decide if any primitives are inside the tile and control stream data associated to the primitive block which is visible in the tile are written to memory for the tile display list. In the example display control stream in T4 will include control data for the address pointers of the three primitive blocks and triangle visible mask of the triangles within the primitive blocks 802 and 804. For example the first three triangles from left in primitive block 802 and the big triangle from primitive block 804 are visible in T4. For tile T10 the control stream data will only consists of the information from primitive block 803 and 804. Meanwhile the control data such as address pointer of the vertex data associated to primitive block 803 are inserted into control streams for tile T10, T11, T14 and T15, while control data associated with primitive block 804 are inserted into control streams for every tile in the macro tile MT0.
In 3D render processing 808 all the tiles are processed one by one in the order of region array as 300 in FIG. 3. For each tile in 3D processing control data from the tile control stream are read from memory first, as 301 in FIG. 3. Vertex and primitive data associated to the primitive blocks in the tile control data are read from memory as 302 in FIG. 3 therefore all the triangles which are visible to the tile are processed and rendered to screen.
For tile based computer graphics system the render is performed on a tile by tile basis. The big advantage of tile based rendering is that it significantly reduces the requirement for graphics system internal storage and memory bandwidth.
The disadvantage of tile based rendering is the increased control data needed for the display list in each tile. Display control data need to be written to all the tiles which a triangle covers. For large triangles which cover many tiles the total amount of control data written is significant.
For example a render with screen size 1920×1080 can be divided into 8160 tiles of 16×16 pixels. A big triangle which covers the whole screen has to be added to the display lists of all the tiles. If there are two 32 bit words needed for the control data, then the total control stream data is nearly 64 KB for the single full screen triangle. The large amount of control data needed in this case has a significant impact on the performance of the render in tile based 3D computer graphics system.
The system presented in UK Patent Application No. 0717781.9, processes two tiles at the same time in order to improve the performance of Tiling Engine. The control stream data from the two adjacent tiles are combined into a single display list, which gives the total control stream data of near 32 KB from the above example. The size of the control stream data is about half the size in the case, but it is still a significant amount of control data for a triangle.