I. Field
The present disclosure relates generally to image processing, and more specifically to techniques for load balancing a three-dimensional (3D) graphics pipeline for quick pixel rendering processing processed by an interleaved multi-threaded processor.
II. Background
Converting information about 3D objects into a bit map that can be displayed is known as pixel rendering, and requires considerable memory and processing power. In the past, 3D graphics were available only on powerful workstations, but now 3D graphics accelerators are commonly found in personal computers (PC). The hardware graphics accelerator contains memory (e.g. instruction random access memory (IRAM)) and a specialized microprocessor to handle many of the 3D rendering operations. Open GL® (Open Graphics Library) for desktops defines an application programming interface (API) for writing applications that produce 3D and 2D computer graphics. The API includes hundreds of functions for drawing complex three-dimensional scenes from primitives.
OpenGL® ES is a subset of the desktop OpenGL® which creates an interface between software and graphics. The 3D Graphics Engine (OpenGL®ES) is implemented into generally two parts. The first part includes those functions which process the vertex and is typically implemented in the digital signal process (DSP) firmware. The second part includes those functions for pixel rendering and are implemented in a dedicated hardware graphics accelerator. The second part which performs the pixel rendering is the last pipeline stage of a conventional 3D graphics engine. The last pipeline stage processes input triangle sets to produce a pixel representation of the graphics image. However, the last pipeline stage is typically the performance bottle neck of the entire 3D graphics pipeline in the engine. Therefore, it is very important to improve the performance (in pixel per second) of the last pipeline stage for pixel rendering.
Typically, during pixel rendering operations, each input triangle needs to be processed sequentially, in the same order as the triangles are input. Thus, a processor with multi-threads is prevented from utilizing interleaved parallel processing to process an input triangle.
Furthermore, the hardware graphics accelerators are not generally flexible or easily scalable. Thus, the hardware graphics accelerators cannot easily add new features, support higher versions of the 3D graphics standard (such as OpenGL®ES 1.0, 1.1 . . . ), support different application configurations and customize requirements. Furthermore, the hardware graphics accelerators are not easily scaled for different performance requirements (frame rate, screen size, pixel rate, triangle rate, etc. . . . ), to optimize silicon cost and system power consumption.
As can be readily seen, a dedicated hardware graphics accelerator takes up silicon area in small handheld computing devices, such as a mobile or cellular telephone. Accordingly, a dedicated hardware graphics accelerator increases the overall cost of a handheld computing device by the inclusion of the dedicated hardware graphics accelerator and IRAM used. The use of a dedicated hardware graphics accelerator also produces data traffic with the DSP which adds overhead and consumes power.
There is therefore a need in the art for techniques to load balance a three-dimensional (3D) graphics pipeline to provide quicker pixel rendering processing.