A graphics processing unit or “GPU” is a device used to perform graphics rendering operations in modern computing systems such as desktops, notebooks, and video game consoles, etc. Traditionally, graphics processing units are typically implemented as either integrated units or within discrete video cards.
Integrated graphics processing units are graphics processors that utilize a portion of a computer's system memory rather than having its own dedicated memory. Due to this arrangement, integrated GPUs (or “iGPUs”) are typically localized in close proximity to, if not disposed directly upon, some portion of the main circuit board (e.g., a motherboard) of the computing system. Integrated GPUs are, in general, cheaper to implement than discrete GPUs, but are typically lower in capability and operate at reduced performance levels relative to discrete GPUs.
Discrete or “dedicated” GPUs (or “dGPUs”) are distinguishable from integrated GPUs by having local memory dedicated for use by the GPU which they do not share with the underlying computer system. Commonly, discrete GPUs are implemented on discrete circuit boards called “video cards” which include, among other components, a GPU, local memory units, an interface with one or more communication buses and various output terminals. These video cards typically interface with the main circuit board of a computing system through an interface of a standardized expansion slot such as PCI Express (PCI-e) or Accelerated Graphics Port (AGP), upon which the video card may be mounted. In general, discrete GPUs are capable of significantly higher performance levels relative to integrated GPUs. However, discrete GPUs also typically require their own separate power inputs, and require higher capacity power supply units to function properly. Consequently, discrete GPUs also have higher rates of power consumption relative to integrated graphics solutions.
Some modern main circuit boards often include an integrated graphics processing unit as well as one or more additional expansion slots available to add a dedicated graphics unit. Each GPU can and typically does have its own output terminals with one or more ports corresponding to one or more audio/visual standards (e.g., VGA, HDMI, DVI, etc.), though typically only one of the GPUs will be running in the computing system at any one time. Alternatively, other modern computing systems can include a main circuit board capable of simultaneously utilizing two identical dedicated graphics units to generate output for one or more displays.
Some notebook and laptop computers have been manufactured to include two or more graphics processors. Notebook and laptop computers with more than one graphics processing units are almost invariably solutions featuring an integrated GPU and a discrete GPU. Portable computing devices with both integrated and discrete graphics processing solutions often offer a mechanism or procedure that enables the user to alternate usage between the particular solutions so as to manage performance and battery life according to situational needs or desired performance levels. Recently, the PCI Express expansion slot interface has become a dominant interface standard through which discrete GPUs are coupled to the main circuit boards of mobile computing devices. However, unlike PCI-e interfaces in other computing systems such as desktops, the PCI-e interface of a portable computing device is often of a reduced size and, naturally, of a reduced capacity. In a typical configuration, the PCI-e interface of any computing device comprises a plurality of links, with each link comprising a further plurality of “lanes,” and with each link being configured to independently couple to a peripheral device. The number of lanes in a link coupled to a peripheral device correlates with the bandwidth of the connection, and thus, couplings between a peripheral device and a link with larger amounts of lanes have greater bandwidth than couplings with links comprised of only single lanes. Traditionally, the number of links in a PCI-e interface of a portable computing device may be configured by the manufacturer in separate configurations to suit specific hardware implementations.
In a popular configuration, the links in a PCI-e interface of a portable computing device may be arranged in either of two combinations totaling up to four lanes. For example, implementations can comprise either a single link of four lanes (1×4), thereby offering relatively greater bandwidth for a coupled device. Alternatively, implementations may feature four separate links, with each link capable of being coupled to a separate device but limited to a single lane (4×1) with a correspondingly low bandwidth. Thus, whenever the PCI-e interface is coupled to one device, the single link (1×4) configuration may be optimal, but multiple devices require additional links that adversely impact the amount of bandwidth and throughput of each connection.
Unfortunately, since netbooks and laptops are often intended to be used with network connections, chipset manufacturers of computing devices that will include a discrete GPU will invariably manufacture circuit boards with PCI-e interfaces having four separate links of one lane each, one of which is occupied by a network controller (e.g., a network interface card). This results in the extremely inefficient configuration wherein only one link is coupled to the graphics processing unit, another link is coupled to the network controller, and the other two links remain unoccupied (or coupled to additional devices). While the bandwidth from a link with only one lane may be sufficient to run certain applications on certain devices, for usage in graphics processing a link having only a single lane is often insufficient and likely to drastically and adversely impact the performance of the discrete graphics processing unit.
According to typical graphics rendering processes, single units of images displayed to a user during a graphical sequence (e.g., a video) during the execution of an application are arranged as frames. Each frame is produced by sending graphics rendering instructions from the executing application to a GPU for rendering. Once a frame has completed rendering, the GPU will store the completed frame in one or more frame buffers. Generally, the size of a GPU's frame buffers is static and comprised in the local memory of the GPU. However, the size of the data contained in a rendered frame can often vary widely between applications. In general, higher resolutions are preferable for many applications. Higher resolutions also increase the size of the rendered frames. This may not be a concern when the application produces relatively simple graphical output (e.g., typical word processing applications). However, 3D gaming applications are generally graphically intensive and, when displayed at a sufficiently high resolution, a rendered frame may be large enough such that the remaining space available in the frame buffer may not be sufficient to store additional graphics resources (e.g., textures).
Typically, when the size of a rendered frame consumes a large amount of space in the frame buffer, those additional graphics resources may be stored in the system memory. The extra data is communicated (e.g., copied) to the system memory through the coupling communication bus (typically, the PCI-e bus). However, when the bandwidth of the PCI-e interface is limited, as in single lane link architectures, due to the limited speed of data transfer rates, transferring the data between the memory of the GPU and system memory when accessing the graphics resources will add considerably to the duration of the graphics rendering process. This can adversely affect the user's graphical experience by creating significant delays and severely crippling the rate at which scenes or images may be displayed to the user (e.g., the application's “frame rate”). In 3D gaming applications which can be extremely time sensitive, even slight delays can be a nuisance, with significant delays potentially becoming a significant problem.