1. Field of the Invention
The present invention generally relates to load-balancing in graphics processing systems.
2. Background Art
Conventional graphics processing systems, such as graphics processing units (GPUs), include a number of interrelated modules to perform critical image processing functions. These modules combine to form, as understood to those of skill in the art, a graphics pipeline. Included within this pipeline, is a shader engine.
A single graphics pipeline can include many shader engines. Traditionally, the shader engine is responsible for providing texture to three dimensional images for display on a monitor. One of the critical components within each shader engine is a single instruction stream multiple data-stream (SIMD) module. SIMD modules (or simply SIMDs) are used to perform one operation on multiple sets of data, and they handle the majority of the workload within the shader engine. Each SIMD processes a portion of the workload within each shader engine. Therefore, a task of critical importance to maximizing efficiency and throughput within the shader engine is determining how to distribute the workload across SIMDs.
A common assumption is that shader engine performance will increase with a corresponding increase in the number of SIMDs. This statement is only true, however, in limited circumstances, such as during heavy workload conditions. For a less heavy load scenario, which is frequently the case, the performance of the shader engine may not necessarily increase as the number of SIMDs increases. In these situations, if all the SIMDs are enabled, power is wasted because underutilized SIMDs (i.e., SIMDs with lighter or no workloads) remain enabled and active.
Conventional graphics systems simply divide the workload across all of the SIMDs within the shader engines used for a particular operation. This approach, however, is extremely power-inefficient. The inefficiency occurs because there is no determination of whether fewer SIMDs could be used to perform the operation without compromising system performance. Thus, conventional systems keep all of the SIMDs within the shader engine active regardless of whether they are required to perform the operations.
As noted above, when SIMDs are enabled, they consume power. Even in an idle state, SIMDs still consume a minimal, but measurable, amount of power. Thus, keeping all available SIMDs enabled, even if unused or underutilized, wastes power. It would also be beneficial to be able to compact the workload in as minimum number of SIMDs as possible. This approach reduces unnecessary overhead of presenting the workload to every available SIMD.
Workloads can also be problematic for conventional graphics systems in other ways. For example, the problems of an unpredictable, or improperly distributed, workload can be exacerbated by the chips instantaneous rate of current change (di/dt). In the absence of workload within a conventional graphics core, the chip consumes a certain amount of current. If the workload suddenly surges, however, the chip begins to drive more current. If the workload arrives in total (i.e., at once or over a very short period of time), the core goes from idle to completely busy. Correspondingly, the current will also go from minimum to maximum in a short amount of time causing a severe di/dt effect. Ideally, di/dt should be as small as possible.
What are needed, therefore, are methods and systems to determine the current and future utilization of each SIMD, activate the SIMDs in accordance with this determination, and distribute the workload to the activated SIMDs. What are also needed are methods and systems to reduce the negative effects of di/dt.