Many rendering engines for use in three-dimensional computer graphics are towards having programmability and being more processor-like in order to adapt to complex and sophisticated shading algorithms. To be more specific, rendering engines are no longer hardware with fixed graphics functions and they increasingly resemble processors which have built-in arithmetic units with instruction sets much like those of a CPU and which are programmable to accommodate additional functions in a flexible fashion.
As the speed of CPUs is increased, the gap between memory access performance of a rendering engine and processing performance of an arithmetic unit tends to grow. An arithmetic unit processes pixel data, and a read-modify-write (RMW) unit reads and writes pixel data from and to a frame buffer. Since the latency to read, modify and write pixel data is significantly longer than the latency of an arithmetic unit, the performance of rendering process is reduced accordingly.
Relatively long latency of arithmetic units in a rendering engine of processor type may make it necessary to suspend the operation for a period of time determined by latency for arithmetic operations, if required so by dependency between data input to the engine. This is likely to produce idle time (referred to as bubbles) in a pipeline and lower efficiency. Bubbles can be concealed only by software means such as modifying shader codes, which makes application development a difficult task.
In respect of memory latency problem, data consistency should be guaranteed when a frame buffer is accessed by a read-modify-write operation, resulting in imposing restriction that disables the implementation of complex control. In the related art, this has been addressed by isolating a shader from an RMW unit so that the RMW unit reads from and writes to a frame buffer, using a simple pipeline process. Such an approach enables flexible execution of a program since the shader does not access the frame buffer. There are growing needs, however, for even higher functionality of a graphic process including the RMW function, in order to allow the shader to execute a complex shading algorithm or perform advanced arithmetic processing such as image processing. Memory latency is quite long and so the associated problem of reduction in processing efficiency due to bubbles is even greater than the problem associated with the latency for arithmetic operations, prohibiting the graphic process including the RMW function from having higher functionality.