1. Field of the Invention
Embodiments of the invention generally relate to computer processors, and more specifically to dynamic load balancing in a massively threaded graphics processing environment.
2. Description of the Related Art
The process of rendering two-dimensional images from three-dimensional scenes is commonly referred to as image processing. As the modern computer industry evolves image processing evolves as well. One particular goal in the evolution of image processing is to make two-dimensional simulations or renditions of three-dimensional scenes as realistic as possible. One limitation of rendering realistic images is that modern monitors display images through the use of pixels.
A pixel is the smallest area of space which can be illuminated on a monitor. Most modern computer monitors will use a combination of hundreds of thousands or millions of pixels to compose the entire display or rendered scene. The individual pixels are arranged in a grid pattern and collectively cover the entire viewing area of the monitor. Each individual pixel may be illuminated to render a final picture for viewing.
One technique for rendering a real world three-dimensional scene onto a two-dimensional monitor using pixels is called rasterization. Rasterization is the process of taking a two-dimensional image represented in vector format (mathematical representations of geometric objects within a scene) and converting the image into individual pixels for display on the monitor. Rasterization is effective at rendering graphics quickly and using relatively low amounts of computational power; however, rasterization suffers from some drawbacks. For example, rasterization often suffers from a lack of realism because it is not based on the physical properties of light, rather rasterization is based on the shape of three-dimensional geometric objects in a scene projected onto a two-dimensional plane. Furthermore, the computational power required to render a scene with rasterization scales directly with an increase in the complexity of the scene to be rendered. As image processing becomes more realistic, rendered scenes also become more complex. Therefore, rasterization suffers as image processing evolves, because rasterization scales directly with complexity.
Another technique for rendering a real world three-dimensional scene onto a two-dimensional monitor using pixels is called ray tracing. The ray tracing technique traces the propagation of imaginary rays, rays which behave similar to rays of light, into a three-dimensional scene which is to be rendered onto a computer screen. The rays originate from the eye(s) of a viewer sitting behind the computer screen and traverse through pixels, which make up the computer screen, towards the three-dimensional scene. Each traced ray proceeds into the scene and may intersect with objects within the scene. If a ray intersects an object within the scene, properties of the object and several other contributing factors are used to calculate the amount of color and light, or lack thereof, the ray is exposed to. These calculations are then used to determine the final color of the pixel through which the traced ray passed.
The process of tracing rays is carried out many times for a single scene. For example, a single ray may be traced for each pixel in the display. Once a sufficient number of rays have been traced to determine the color of all of the pixels which make up the two-dimensional display of the computer screen, the two-dimensional synthesis of the three-dimensional scene can be displayed on the computer screen to the viewer.
Ray tracing typically renders real world three-dimensional scenes with more realism than rasterization. This is partially due to the fact that ray tracing simulates how light travels and behaves in a real world environment, rather than simply projecting a three-dimensional shape onto a two-dimensional plane as is done with rasterization. Therefore, graphics rendered using ray tracing more accurately depict on a monitor what our eyes are accustomed to seeing in the real world.
Furthermore, ray tracing also handles increases in scene complexity better than rasterization as scenes become more complex. Ray tracing scales logarithmically with scene complexity. This is due to the fact that the same number of rays may be cast into a scene, even if the scene becomes more complex. Therefore, ray tracing does not suffer in terms of computational power requirements as scenes become more complex as rasterization does.
One major drawback of ray tracing is the large number of calculations, and thus processing power, required to render scenes. This leads to problems when fast rendering is needed. For example, when an image processing system is to render graphics for animation purposes such as in a game console. Due to the increased computational requirements for ray tracing it is difficult to render animation quickly enough to seem realistic (realistic animation is approximately twenty to twenty-four frames per second).
To address this problem and others, the software industry has developed applications with more than one thread of execution, or threaded applications. The use of two or more threads allows a program to fork (or split) itself into a plurality of simultaneously (or pseudo-simultaneously) running tasks. To optimize the performance of threaded applications, the industry has developed computers with processors with multiple processing cores which may be used to simultaneously process data from multiple threads of execution (e.g., from multiple programs, from multiple processes, and/or from multiple threads). Each processing core may itself be used to process multiple threads of execution, for example, by processing the threads simultaneously (simultaneous multithreading) or by processing each thread for a short amount of time (e.g., as determined by a priority) before processing a subsequent thread as known to those skilled in the art.
There is generally a desire to have as many processing cores as possible each concurrently processing as many threads as possible in order to obtain the greatest processing power and efficiency from the processor. For example, a plurality of threads may be used to execute an application such as a video game which performs three-dimensional graphics rendering, sound effects, physics simulations, player input/output, and other functions. To provide the most realistic experience to the video game player, there may be a desire to have each thread perform a given function (e.g., one thread may draw a three-dimensional scene, also referred to as rendering while another thread performs a physics calculation) requiring a certain amount of processing power for a set amount of time. For example, if the processor is being used to render a three-dimensional sequence of an action being performed by a video game player in a video game, there may be a desire to render each picture (referred to as a frame) in the sequence quickly such that the action appears to occur smoothly (e.g., if the action being performed by a video game player is a jump, there may be a desire for the jump to appear smoothly up and down as would a real-life jump).
In order to maintain simultaneously executing threads, the processor may be configured to efficiently retrieve data and/or instructions for each executing thread from the computer's main memory. In some cases, the retrieved data and instructions may be placed in one or more small memories referred to as caches which may be located on the same chip as the processor. The caches may also be arranged hierarchically, for example, such that a first cache (referred to as an level two cache, or L2 cache) is shared by each processing core in a processor while multiple smaller caches (referred to as level one, or L1 caches) are provided for a given processing core or group of processing cores. Where data and instructions requested by a thread are not available in one of the processor caches, the processor may request the data and instructions from the main memory.
While the requested data and instructions are retrieved from main memory, execution of the thread requesting the data and instructions may be temporarily paused by the processing core to provide time for the request to be fulfilled. In some cases, other threads may be executed while the thread requesting data and instructions is paused. However, if too many threads are paused waiting for data and instructions, one or more processing cores in the processor may remain idle while the data and instructions are retrieved from the main memory.