The present invention relates generally to multithreaded processors and in particular to thread-type-based load balancing in a multithreaded processor.
Multithreaded processors are known in the art. Such processors can manage multiple concurrent processing tasks, or “threads.” A thread can be a unit of processing work of any size, and processors can create and terminate threads at various times. Between creation and termination, the thread's state is maintained within the processor, even at times when the processor is not actively executing the thread, allowing the processor to switch back and forth among multiple threads, creating a higher apparent degree of parallelism than the processor hardware actually supports.
Graphics processors benefit greatly from multithreading. As is known in the art, computer-based rendering generally involves performing the same operations repeatedly on different input data. For instance, a scene to be rendered may be defined in terms of a large number of primitives (e.g., points, lines, simple polygons) in a three-dimensional space. The vertices of each primitive are transformed to a viewing space, and the primitive is “rasterized” to determine which pixels in the image plane that primitive covers. Thereafter, each pixel is shaded based on the primitive(s) that cover it. Vertex transformations generally entail performing the same computations on each vertex of each primitive, and pixel shading also usually involves computations that are repeated for multiple pixels and/or multiple primitives. Since each vertex is processed independently of each other vertex, a thread can be defined for each vertex to be processed. Similarly, since each pixel is processed independently, a thread can be defined for each pixel to be processed. A multithreaded graphics processor can process vertex (or pixel) threads in any order, with any number of threads being processed in parallel.
Vertex transformation and pixel shading are usually very different operations that demand different amounts of processing resources. Traditionally, graphics processors include separate hardware sections dedicated to vertex and pixel processing, with each section being optimized for one or the other type of thread. More recently, graphics processors in which at least some processing resources are shared between vertex threads and pixel threads have been proposed. Because the relative demand for vertex and pixel processing varies from application to application, allowing hardware resources to be redirected to pixel or vertex processing as needed should improve overall efficiency.
In a shared-resource graphics processor, pixel threads and vertex threads compete for a limited supply of various processing resources. If too much of a resource is devoted to vertex processing and too little to pixel processing, the pixel stage of the rendering pipeline will tend to back up, slowing image generation to possibly unacceptable rates. Eventually, the backpressure can also stall vertex processing and lead to idle cycles in the graphics processor. Conversely, if too much of a resource is devoted to pixel processing and too little to vertex processing, the pixel stage of the pipeline could become starved for input data, again leading to idle cycles in the graphics processor. For many current graphics applications, the fraction of processing work devoted to vertex processing is relatively small, so more resources should be devoted to pixel processing than to vertex processing. Devoting too much of a resource to pixel processing, however, can lead to bubbles in the pipeline.
For optimal performance, it is desirable to keep the graphics processor fully busy most (ideally all) of the time. Accordingly, it is desirable to prevent too much of any resource from being devoted to one type of thread at the expense of another.