1. Field of the Invention
Embodiments of the present invention generally relate to graphics processing and, more particularly, to rendering images on systems with multiple graphics processing units (GPUs).
2. Description of the Related Art
Computer graphics image data typically undergoes several processing steps before each graphics frame is completely rendered for display or storage. Each processing step typically operates on graphics image data utilizing programming steps defined through an application programming interface (API), enabling the graphics application to utilize high performance hardware, such as a graphics processing unit (GPU), to execute a set of processing steps with minimal real-time supervision from a host central processing unit (CPU). For example, a software application executing on the host CPU may use an API to program processing steps in a GPU including physics, geometric transform, polygon setup, rasterization and pixel shading, resulting in the generation of complex graphics image frames for display or storage with minimal impact on the host CPU performance.
Historically, computing devices have included only one GPU that was responsible for both processing graphics commands and displaying the resulting images. With only one GPU, questions about how to distribute work among multiple processing devices never really arose. However, as graphics applications begin to implement more steps with greater complexity in each step, the computational load on the GPU executing the processing steps increases, resulting in diminished overall rendering performance.
One approach to improving overall processing time has been to configure multiple GPUs to concurrently process a single graphics frame or assign multiple GPUs to process alternating graphics frames. Such approaches generally involve synchronizing the GPUs to simultaneously render portions of the same frame or sequential frames to increase overall rendering performance. However, in current systems where multiple GPUs concurrently process a single frame, the graphics application has no way to inform the GPUs of the spatial locality of the processed image data. All of the rendered data from each GPU has to be copied to all of the other GPUs to form a combined image, thereby limiting the overall system performance. This applies in particular to generating texture data at run-time by sending rendering commands to the GPUs that store the rendering results in the texture map memory storage. Several common usage patterns of rendering texture data imply that sections of the texture data are only accessed by a subset of GPUs, so not all of the rendered texture data has to be copied to all other GPUs.
Accordingly, what is needed is an improved method of rendering texture data in a multi-GPU system with enhanced system performance.