This invention relates to a three-dimensional computer graphics rendering system and in particular to a method and an apparatus associated with combining multiple independent tile based graphics cores for the purpose of increasing geometry processing performance.
It is desirable to offer computer graphics processing cores at many different performance points e.g. from basic hand-held applications through to sophisticatedly dedicated graphic computers. However, the complexity of modern computer graphics makes it difficult to do this in either a timely or cost effective manner. As such, it is desirable to have a method of combining multiple independent cores such that performance may be increased without developing a whole new core.
Tile based rendering systems are well-known. These subdivide an image into a plurality of rectangular blocks or tiles. FIG. 1 illustrates an example of a tile based rendering system. A primitive/command fetch unit 101 retrieves command and primitive data from memory and passes the command and the primitive data to a geometry processing unit 102. The geometry processing unit 102 transforms the primitive and command data into screen space using well-known methods. This data is then supplied to a tiling unit 103 which inserts object data from the screen space geometry into object lists for each of a set of defined rectangular regions or tiles. An object list for each tile contains primitives that exist wholly or partially in that tile. The list exists for every tile on the screen, although some object lists may have no data in them. These object lists are fetched by a tile parameter fetch unit 105 which supplies the object lists tile by tile to a hidden surface removal unit (HSR) 106. The hidden surface removal unit (HSR) 106 removes surfaces which will not contribute to the final scene (usually because they are obscured by another surface). The HSR unit 106 processes each primitive in the tile and passes only data for visible pixels to a testing and shading unit (TSU) 108. The TSU takes the data from the HSR and uses the data to fetch textures and apply shading to each pixel within a visible object using well-known techniques. The TSU then supplies the textured and shaded data to an alpha test/fogging/alpha blending unit 110. The alpha test/fogging/alpha blending unit 110 can apply degrees of transparency/opacity to the surfaces again using well-known techniques. Alpha blending is performed using an on chip tile buffer 112 thereby eliminating the requirement to access external memory for this operation. Once each tile has been completed, the pixel processing unit 114 performs any necessary backend processing such as packing and anti-alias filtering before writing the result data to a rendered scene buffer 116, ready for display.
In British Patent No. GB2343598 there is described a process of scaling rasterization performance within a tile based rendering environment by separating geometry processing and tiling operations into a separate processor that supplies multiple rasterization cores. This method does not take into account the issues of scaling geometry processing and in particular tiling throughput across multiple parallel tile based cores.
It is commonly known that 3D hardware devices must preferably preserve the ordering of primitives with respect to the order in which they were submitted by a supplying application. For example FIG. 2 illustrates 4 triangles T1 (200), T2 (210), T3 (220) and T4 (230) that are present by the application in the order T1, T2, T3, T4 and overlap the four tiles Tile 0 (240), Tile 1 (250), Tile 2 (260) and Tile 3 (270) as shown. In order to preserve the original order of the triangles in the tile lists the triangles would be referenced in each tile list as follows.TABLE-US-00001 Tile 0 Tile 1 Tile 2 Tile 3 T1 T2 T3 T3 T2 T3 T4 T3
In order to evenly distribute load across geometry and tiling processors, the input data needs to be split across the processors either on a round-robin basis or based on the load on individual processors. However, as each processor is generating object tile lists locally, the preservation of the order in which objects are inserted into tiles requires that the order in which the processors write to the per tile object lists be controlled. This control would normally require communication between each of the GPC's (Graphics Processing Cores) present, meaning that their design would need to be changed when scaling the number of cores present.