A graphics processing unit (GPU) is a dedicated graphics rendering device for a personal computer, workstation, or game console. Modern GPUs are very efficient at manipulating and displaying computer graphics, and their highly parallel structure makes them more effective than typical CPUs for a range of complex algorithms. A GPU implements graphics primitive operations in a way that makes running them much faster than drawing directly to the screen with the host CPU.
Multiple GPU systems use two or more separate GPUs, each of which generates part of a graphics frame or alternate frames. The work of multiple GPUs is mixed together into a single output to drive a display.
When multiple GPUs work together their overall performance depends in part on the speed and efficiency of data transfers between GPUs. In today's multi-GPU systems, multiple reads and writes from one device to another over the system bus (e.g. the PCI Express bus) must all be either strictly ordered or all allowed to be unordered. There is no distinction for sub-device sources and destinations. In other words there is no support for multiple independent data streams each with its own rules. Furthermore there is no inherent mechanism for determining whether or not a particular write stream has completed. The PCI Express bus provides only a restricted number of prioritized traffic classes for quality of service purposes.
Memory-to-memory transfers between devices are more efficient when the memory is mapped into system bus space. In some cases, however, mapping is not possible because the size of the aperture available for peer to peer transfers is less than the size of the memory to be mapped. One or more software programmable offsets may be used to provide windows into a larger memory space. However, this approach does not work when a single chunk of memory exceeds the window size.