A tiling rendering architecture subdivides a computer generated image into smaller parts to be rendered separately. Each part is called a tile. The pipeline of a tiling rendering architecture often consists of a front-end and a back-end. The front-end performs vertex-shading on the vertices in the scene and sorts each resulting triangle into the tiles it overlaps. Note that shading of non-geometric attributes may be delayed until the back-end. The back-end, occurring after the front-end, processes each tile separately by vertex-shading any remaining attributes, rasterizing its triangles and pixel-shading the resulting fragments.
Parallel hardware with many independent execution units, called cores, needs a strategy to distribute rendering work evenly among the cores for full utilization of its resources, i.e. the work needs to be load balanced. This is extremely important as the rendering performance can be substantially higher depending on how this is done.
The front-end can split the scene geometry into suitable chunks for each core to process in parallel. Each such chunk is called a geometry batch. The splits can be arbitrary and, thus, it is easy to achieve good load balance in the front-end. The back-end is inherently parallel since each tile can be processed independently. This does, however, not guarantee a good load balance. Depending on the distribution of geometry and shading complexity in the scene, the majority of work may end up in only a few of the tiles. In the worst case, a single tile is expensive and the rest is cheap. This results in a load imbalance since the core that picks the expensive tile will require a lot of time to process it. During this time the remaining cores will be idle since they finish their work quickly.