The present invention relates generally to graphics processing subsystems with multiple processors and in particular to private addressing for such graphics processing subsystems.
Graphics processing subsystems are designed to render realistic animated images in real time, e.g., at 30 or more frames per second. These subsystems are most often implemented on expansion cards that can be inserted into appropriately configured slots on a motherboard of a computer system and generally include one or more dedicated graphics processing units (GPUs) and dedicated graphics memory. The typical GPU is a highly complex integrated circuit device optimized to perform graphics computations (e.g., matrix transformations, scan-conversion and/or other rasterization techniques, texture blending, etc.) and write the results to the graphics memory. The GPU is a “slave” processor that operates in response to commands received from programs executing on a “master” processor, generally the central processing unit (CPU) of the system.
To meet the demands for realism and speed, some modern GPUs include more transistors than typical advanced CPUs. In addition, modern graphics memories have become quite large in order to improve speed by reducing traffic on the system bus; some cards now boast as much as 256 MB of memory. But despite these advances, a demand for even greater realism and faster rendering persists.
As one approach to meeting this demand, some manufacturers have begun to develop “multi-chip” graphics processing subsystems in which two or more GPUs operate in parallel on the same card. Parallel operation substantially increases the number of rendering operations that can be carried out per second without requiring significant advances in GPU design. To minimize resource conflicts between the GPUs, each GPU is generally provided with its own dedicated memory area (referred to herein as a “local memory”).
Ideally, the size of this local memory is the same as the total memory size of a single-chip graphics subsystem; thus, for a two-chip card, it might be desirable to provide 512 MB (or more) of memory. Unfortunately, in conventional personal computer systems, the total memory of a multi-chip card can easily exceed the allotted address space for the graphics subsystem. For instance, one common addressing scheme provides a 4 GB global address space in which addresses can be expressed as 32-bit unsigned integers. Each expansion slot is allocated a specific 256-MB range within that address space. If a multi-chip card occupying one expansion slot includes 512 MB of memory, then not all of this memory can be assigned unique physical addresses. One solution is to design a “multi-card” subsystem that occupies two (or more) expansion slots, allowing each memory location to have its own address, but this is often undesirable, as expansion slots may be a limited resource and bus speeds may be too slow to support the needed rate of communication between the cards.
Another solution has been to permit duplication of memory address associations within the graphics subsystem. For example, if the local memory of each GPU includes 256 MB, one memory address can be mapped to a location in each of the local memories. This allows the CPU (or another external system component) to access the local memories in parallel. For example, in response to a write request, circuitry inside the graphics card can broadcast the data to each local memory. Read requests can also be handled by broadcasting the request to a set of memory interfaces, each associated with one of the local memories and configured to determine whether its associated local memory should respond to a given request.
While use of duplicate addresses does not prevent the CPU from accessing the graphics memory, the duplication makes it more difficult for any of the GPUs to access data stored in “remote” graphics memories (i.e., any graphics memory other than its own local memory). For example, in a two-chip card, an address in the first GPU's local memory is generally also an address in the remote memory (i.e., the second GPU's local memory). Since it is more often the case that the GPU wants to access its own local memory, the address is typically interpreted to refer to the local memory, not the remote memory.
In such systems, data transfers between different graphics memories generally require an indirect path. For example, data in a first graphics memory can be transferred to a location in an off-card memory (e.g., the main system memory), then transferred again from the off-card memory to a location in a second graphics memory. This process is undesirably slow because two transfers are involved and because the data has to be transmitted via the system bus twice: from the graphics card to the off-card memory, then from the off-card memory back to the graphics card.
It would, therefore, be desirable to enable direct transfers from one memory of a multi-chip graphics subsystem to another, without requiring that the data be transferred off the graphics card.