Graphics systems are typically implemented as a three-dimensional assembly of different cards (also sometimes called “boards”) that are plugged into a motherboard. The motherboard is the main circuit board of the system and typically includes a central processing unit and other chips that are known as a “chipset.” Additionally, a motherboard includes connectors, ports, and other features for attaching other electronic components.
Referring to FIG. 1, in a conventional graphics system a motherboard 100 includes a chipset that includes, for example, a bridge unit 110 and a central processing unit (CPU) 120. For the purposes of illustration, a graphics card 130 is illustrated in position for assembly. Graphics card 130 typically includes a graphics processing unit (GPU) (not shown). The graphics card 130 typically includes connector surfaces 135. For the purposes of illustration, a single connector surface 135 is illustrated that is designed to mate with a Peripheral Component Interface (PCI) Express (often referred to as “PCI-E” or “PCIe”) connector 140. PCI-E is a high speed bus interface standard that utilizes high speed serial data lanes. The PCI-SIG organization publishes the PCI-E standard. An individual data lane 150 comprises two simplex connections, one for receiving data and the other for transmitting data.
The PCI-E standard specifies a protocol for bus interfaces to configure a set of data lanes into a link between two entities. The bandwidth of the link scales with the number of data lanes operated in parallel. The size of a PCI-E bus is commonly referred to as a multiple of one data lane, e.g., “×N” or “N×” to indicate that the link has N times the bandwidth of a single data lane. PCI-E supports bus sizes of ×1, ×2, ×4, ×8, ×16, and ×32 lanes. Conventionally, a variety of standard connector sizes are utilized, with a ×16 connector size being commonly used for graphics cards.
FIG. 2 illustrates a scalable link interface (SLI) graphics system similar to that developed by the Nvidia Corporation of Santa Clara, Calif. A SLI graphics system utilizes two or more graphics cards 130-A and 130-B operating together to produce a single output. That is, the graphics cards process graphics data in parallel. For example, two PCI-E ×16 connectors 140-A and 140-B may be provided on the motherboard 100, one for each graphics card 130-A and 130-B. A PCI-E ×16 bus (e.g., one ×16 bus from a chip 110) is split into two ×8 buses, with one ×8 bus going to each graphics card. Typically, a switch card 170 (also known as a “paddle card”) is provided to determine which of the lanes of the ×16 bus from chip 110 are routed to the two PCI-E connectors 140-A and 140-B. The switch card 170 essentially amounts to an additional PCI-E connector which further includes a switching element. This switch card 170 typically has two positions, a first position in which all sixteen lanes from chip 110 are routed to one PCI-E connector (such as PCI-E connector 140-A) and a second switch position in which eight lanes are routed from chip 110 to PCI-E connector 140-A and the other eight lanes from chip 110 are routed to PCI-E connector 140-B. Thus, in an SLI mode each PCI-E connector has half of its serial data lanes coupled to a chipset, while the other half are unused. This results in an inherent compromise in that graphics processing power in increased (because of the two GPUs operating in parallel) but at the cost that each graphics card has half of the PCI-E bandwidth that would be the case if it was used alone.
SLI is typically implemented in a master/slave arrangement in which work is divided up between graphics processors. Software drivers distribute the work of processing graphics data between the two graphics cards. For example, in split frame rendering (SFR) the graphics processing is organized such that an individual frame is split into two different portions, which are processed by the different graphics processors in parallel. In alternate frame rendering (AFR), one graphics card processes the current frame while the other graphics card works on the next frame. In one version, an external SLI connector 180 provides a link between the graphics cards to transmit synchronization and pixel data between the graphics cards.
Recently, quad SLI systems that include four graphics cards have been released by the Nvidia Corporation. A quad SLI system is an extension of SLI in which four graphics cards process graphics data. For example, the work may be split into a combination of AFR and SFR in which groups of two graphics cards work on alternate frames, with each group of two graphics cards in turn performing split frame rendering.
One problem with conventional SLI is that it is more expensive than desired. In particular, extra components, such as switch cards and SLI connectors, are typically required, increasing the cost. Another issue is related to performance caused by splitting the PCI-E bandwidth of chip 110 between two graphics cards. The bandwidth from the chipset to the GPU is reduced by half compared to a single graphics card architecture. This also has the result of limiting the available bandwidth for GPU-to-GPU traffic that flows through the chipset.
As illustrated in FIG. 3, one alternative to conventional SLI would be to use a more expensive set of chips 305, 310 in the chipset to increase the PCI-E bandwidth such that each GPU 320-A and 320-B has a dedicated ×16 bandwidth to the chipset. However, in addition to the more expensive chipset that is required, the architecture illustrated in FIG. 3 does not have symmetric data paths 350 and 360 from the CPU 302 to the GPUs. Command streams from the GPU may thus arrive at each GPU at slightly different times. As a result, greater care must be taken in regards to synchronization of the operation of the GPUs 320-A and 320-B than for the case of symmetric data pathways. Alternatively, as illustrated in FIG. 4, a SLI architecture with a more expensive chipset of chips 402 and 404 might be used to increase the PCI-E bandwidth allocated to each GPU 420. For example, chips 404 with a ×32 PCI-E interface may be included to support each pair of GPUs 420 with a ×16 bus. However, for many market segments the increased performance of adding additional chips or more expensive chips does not justify the additional chip cost.
Therefore in light of the above-described problems the apparatus, system, and method of the present invention was developed.