Graphics processing subsystems are used to perform graphics rendering in modern computing systems such as desktops, notebooks, and video game consoles, etc. Traditionally, graphics processing subsystems are implemented as either integrated graphics solutions or discrete video cards, and typically include one or more graphics processing units, or “GPUs,” which are specialized processors designed to efficiently perform graphics processing operations
Integrated graphics solutions are graphics processors that utilize a portion of a computer's system memory rather than having their own dedicated memory. Due to this arrangement, integrated graphics subsystems are typically localized in close proximity to, if not disposed directly upon, some portion of the main circuit board (e.g., a motherboard) of the computing system. Integrated graphics subsystems are, in general, cheaper to implement than discrete video cards, but typically have lower capability and operate at reduced performance levels relative to discrete graphics processing subsystems.
Discrete or “dedicated” video cards are distinguishable from integrated graphics solutions by having local memory dedicated for use by the graphics processing subsystem which is not shared with the underlying computer system. Commonly, discrete graphics processing subsystems are implemented on discrete circuit boards called “video cards” which include, among other components, one or more GPUs, the local memory, communication buses and various output terminals. These video cards typically interface with the main circuit board of a computing system through a standardized expansion slot such as PCI Express (PCIe) or Accelerated Graphics Port (AGP), upon which the video card may be mounted. In general, discrete graphics processing subsystems are capable of significantly higher performance levels relative to integrated graphics processing subsystems. However, discrete graphics processing subsystems also typically require their own separate power inputs, and require higher capacity power supply units to function properly. Consequently, discrete graphics processing subsystems also have higher rates of power consumption relative to integrated graphics solutions.
Some modern main circuit boards often include two or more graphics subsystems. For example, common configurations include an integrated graphics processing unit as well as one or more additional expansion slots available to add a dedicated graphics unit. Each graphics processing subsystem can and typically does have its own output terminals with one or more ports corresponding to one or more audio/visual standards (e.g., VGA, HDMI, DVI, etc.), though typically only one of the graphics processing subsystems will be running in the computing system at any one time.
Alternatively, other modern computing systems can include a main circuit board capable of simultaneously utilizing two or more GPUs (on a single card) or even two or more individual dedicated video cards to generate output to a single display. In these implementations, two or more graphics processing units (GPUs) share the workload when performing graphics processing tasks for the system, such as rendering a 3-dimensional scene. Ideally, two identical graphics cards are installed in a motherboard that contains two PCI-Express ×16 slots, set up in a “master-slave” configuration. Both cards are given the same part of the 3D scene to render, but effectively a portion of the work load is processed by the slave card and the resulting image is sent through a connector called the GPU Bridge or through a communication bus (e.g., the PCI-express bus). For example, for a typical scene, the master card renders the top half of the scene while the slave card renders the bottom half. When the slave card is done performing the rendering operations to display the scene graphically, it sends its entire output to the master card, which synchronizes and combines the two images to form one aggregated image and then outputs the final rendered scene to the display device. In recent developments, the portions of the scene rendered by the GPUs may be dynamically adjusted, to account for differences in complexity of localized portions of the scene. However, this solution is designed to improve the graphics rendering by increasing the processing capability for output on a single display, and is generally unsuitable and/or less effective for multi-display configurations.
Even more recently, configurations featuring multi-GPU systems displaying output to multiple displays have been in use. In these systems, each graphics processing subsystem (and each GPU) is individually coupled to a display device, and the operating system of the underlying computer system and its executing applications perceive the multiple subsystems as a single, combined graphics subsystem with a total resolution equal to the sum of each GPU rendered area. With the traditional multi-GPU techniques, each GPU renders a static partition of the combined scene and outputs the respective rendered part to its attached display. Typically, display monitors are placed next to each other (horizontally or vertically) to give the impression to the user that he or she uses a single large display. Each display monitor thus displays a fraction (or “frame”) of the scene. Although each GPU renders its corresponding partition individually, a final synchronization among the GPUs is performed for each frame of the scene prior to the display (also known as a “present”) of the scene on the display devices.
However, the complexity of each portion of a scene is often widely disparate. Thus, if each GPU renders a static partition of the screen (e.g., the specific frame accorded due to its display device), the time each GPU in the system takes to render its respective portion of the scene may vary greatly, depending on the disparity in the loads (i.e., the graphics complexity) between the GPUs. For example, in many video games, the most complex portion of a scene is typically the middle or focus of the scene. In three-display configurations, this often results in higher complexity in the portion of the scene displayed in the middle-oriented frame, and thus, a relatively heavier load for the GPU performing the rendering for the middle display relative to the peripheral displays. As a result, the graphics processing subsystems or GPUs performing the rendering for the peripheral displays will often finish rendering their respective frames of a scene before the GPU with the heavier load, and will remain idle while the heavily-loaded GPU completes the rendering for its frame of the scene, after which the frames are synchronized and then delivered individually from each GPU to its coupled display device. As the idleness may occur for every scene, this can lead to a significant inefficiency and an adverse effect on the user's graphical experience.