1. Field of the Invention
The present invention provides a computer configured to effectuate the use of multiple, off-the-shelf video cards, working in parallel.
2. Discussion of the Related Art
Constant further improvements in graphic performance in computers are needed and desired by consumers. For instance, computers are increasingly used as digital entertainment hubs in the home to perform an array of demanding content creation and data manipulation tasks, including video editing and encoding, complex image processing, HDTV decoding, multichannel audio capture and playback, and of course far more realistic 3-D gaming. Furthermore, greater Internet bandwidth capabilities through the adoption of various high-speed access technologies has resulted in the increased importance of graphics-based processing in online activities. For instance, online merchants provide increasing amounts of visual information to consumers who rely on the visual accuracy of the images in making purchasing decision. The list goes on, including applications like true voice recognition and synthesis, robust and accurate biometrics, and advanced encryption. High-end computers and workstations are also used by professionals for more computer-intensive scientific and engineering calculations, visualization and simulation, film-quality 3-D animation and rendering, advanced financial modeling, and numerous other heavy-duty chores.
Known methods for improving computer graphics performance are described below. In general, these improvements in computer graphics performance are achieved through developments in video card technology and enhancements in computer system architecture to maximize the gains in the video card performance.
Video Cards
Even before the beginning of the widespread use of personal computers, computer graphics has been one of the most promising and most challenging, aspects of computing. The first graphics personal computers developed for mass markets relied on the main computer processing unit (“CPU”) to control every aspect of graphics output. Graphics boards, or video cards, in early systems acted as simple interfaces between the CPU and the display device and did not conduct any processing of their own. In other words, these early video cards simply translated low level hardware commands issued by the CPU into analog signals which the display devices transformed into on-screen images. Because all of the processing was conducted by the CPU, graphics-intensive applications had a tendency to over-utilize processing cycles and prevent the CPU from performing other duties. This led to overall sluggishness and degraded system performance.
To offload the graphics workload from the CPU, hardware developers introduced video cards equipped with a Graphic Processing Unit (“GPU”). GPUs are capable of accepting high level graphics commands and processing them internally into the video signals required by display devices. By way of an extremely simplistic example, if an application requires a triangle to be drawn on the screen, rather than requiring the CPU to instruct the video card where to draw individual pixels on the screen (i.e., low level hardware commands), the application could simply send a “draw triangle” command to the video card, along with certain parameters (such the location of the triangle's vertices), and the GPU could process such high level commands into a video signal. In this fashion, graphics processing previously performed by the CPU is now performed by the GPU. This innovation allows the CPU to handle non-graphics related duties more efficiently.
The primary drawback with early GPU-based video cards was that there was no set standard for the “language” of the various high level commands that the GPUs could interpret and then process. As a result, every application that sought to utilize the high level functions of a GPU based video card required a specialized piece of software, commonly referred to as a driver, which could understand the GPU's language. With hundreds of different GPU-based video cards on the market, application developers became bogged down in writing these specialized drivers. In fact, it was not uncommon for a particularly popular software program to include hundreds, if not thousands, of video card drivers with its executable code. This, of course, greatly slowed the development and adoption of new software. This language problem was resolved by the adoption in modern computer operating systems by standardizing methods of video card interfacing. As a result, modern operating systems, such as the Windows® based operating system (sold by Microsoft Corporation of Redmond, Wash.), require only one hardware driver to be written for a video card. An intermediate software layer called an Application Programming Interface (“API”) mediates interaction between the various software applications, the CPU and the video card. As a result, all that is required is that the video drivers and the applications be able to interpret a common graphics API. The two most common graphics APIs in use in today's personal computers are DirectX®, also distributed by Microsoft Corporation, and OpenGL®, distributed by a consortium of other computer hardware and software interests.
Since the advent of the GPU-based graphics processing subsystem, most efforts to increase the throughput of personal computer graphics subsystems (i.e., make the subsystem process information faster) have been geared, quite naturally, toward producing more powerful and complex GPUs, and optimizing and increasing the capabilities of their corresponding APIs.
The graphics performance of a computer may also be improved through the use of multiple video cards, each with its own or multiple GPUs, processing graphics data in parallel. For example, co-pending and commonly assigned U.S. patent application Ser. No. 10/620,150 entitled MULTIPLE PARALLEL PROCESSOR COMPUTER GRAPHICS SYSTEM, the subject matter of which is hereby incorporated by reference in full, describes a scheme in which the display screen is divided into separate sections, and separate video cards are dedicated to the graphics processing in each of the display sections. It should be appreciated that numerous other technologies and methodologies for improving graphic performance schemes are also known, as described in the background section of U.S. patent application Ser. No. 10/620,150.
Improvements in Computer Architecture
A computer historically comprises a CPU that communicates to various other devices via a set of parallel conductors called a bus. When first introduced, computers only had one bus and were thus called single bus systems. As depicted in FIG. 1, a bus generally includes control lines, address lines and data lines that, combined, allow the CPU to oversee the performance of various operations (e.g., read or write) by the attached devices. Specifically, the CPU uses the control lines to control the operations of the attached devices and the address lines to reference certain memory locations within the device. The data lines then provide an avenue for data transferred to or from a device.
Originally, most buses were set to run at a specified speed, measured in hertz or cycles per second. The CPU and the other various devices attached to the bus transferred data at different speeds, some faster than others. If the bus speed is unregulated, the different transfer speeds of the various components could potentially cause communications problems. Specifically, data transfer errors occur when relatively slower communicating components miss or lose messages from other components. To avoid this problem, the clock bus speed was set at a sufficiently slow speed so that all the components can communicate relatively error free through the bus.
This configuration, however, creates significant performance limitations, because data transfer rates are restricted to the levels of the slowest communicating components on the bus, thus preventing the relatively faster devices from realizing their full potential. The overall system performance could be improved by increasing the throughput (data transfer rates) for all of the devices on the bus and by similarly increasing the fixed bus speed. However, the system-wide improvement is relatively complex and expensive to implement.
To address the above-described problems, a multi-bus configuration may be used. In a multi-bus configuration, faster devices are placed on separate, higher speed buses linked directly to the processor, thus allowing these high throughput devices to work more productively. For instance, it is common to have a separate local bus for graphics processors and other high throughput devices. This configuration thereby allows the high throughput devices to communicate without hindrance from the limitations of other devices.
There are several known ways to create a faster bus. As suggested above, increasing the speed of the bus (clock speed) allows more data transfers to take within a certain time. The capacity of the bus may also be achieved by increasing the width of the bus (i.e., increasing the amount of information being transferred on the bus at a particular instant). Referring back to FIG. 1, an increase in the number of address lines would effectively increase the number of addressable memory locations. Similarly, an increased number of data lines would enable more data bits to be sent at a time.
As described above, a computer may use various buses or a combination of buses. Currently known types of buses are summarized below in TABLE 1:
TABLE 1Max ClockMax WordBus TypeSpeedLengthCommentsIndustrial8MHz 8 or 16 bitsRequires two clockStandardticks for 16 bit dataArchitecturetransfer(ISA)Very slow for highperformance diskaccesses and highperformance videocardsEnhanced8.33MHz32-bitCan support lots ofStandarddevicesArchitectureSupports older(EISA)devices which haveslower or smallerword lengthsTransfers data everyclock tick.Micro channel10MHz32-bitTransfers data everyArchitectureclock tick.(MCA)Video Electronics33MHZ32-bitCannot takeStandardadvantage of 64-bitAssociationarchitecture.(VESA)/Restricted on theEnhanced Videonumber of devices,Electronicswhich can beStandardconnected (1 or 2Association Localdevices).Bus (VL)Peripherals33 or 66MHz32 or 64 bitThe PCI bus has aComponentspecial chip setInterconnectwhich allows more(PCI)sophisticated controlover the devices;PCI Bus can supportmany devicesPeripheral66 or 133MHz64 bitPrimarily inComponentcomputer serversInterconnectExtended (PCI-X)
Currently, most personal computer systems rely on a PCI bus to connect their different hardware devices. PCI is a 64-bit bus, though it is usually implemented as a 32-bit bus. A PCI bus runs at clock speeds of 33 or 66 MHz. At 32 bits and 33 MHz, the PCI local bus standard yields a throughput rate of 133 MBps. In the case of video cards, the bandwidth of the PCI bus has become increasingly limiting.
Related to PCI, Peripheral Component Interconnect Extended (PCI-X) is a computer bus technology that increases the speed that data can move within a computer from 66 MHz to 133 MHz. Thus, PCI-X potentially doubles the speed and amount of data exchanged between the computer processor and peripherals. With PCI-X, one 64-bit bus runs at 133 MHz with the rest running at 66 MHz, allowing for a data exchange of 1.06 GB per second. PCI-X, however, is used primarily in computer servers, and not in desktop computers.
In response to the bandwidth limitations of the PCI Bus, the Accelerated Graphics Port (“AGP”) bus was developed for use with graphics processing devices, and most high performance video cards currently connect to the computer exclusively through a dedicated AGP slot found on the motherboard. AGP is based on PCI but is designed especially for the throughput demands of 3-D graphics. Rather than using the PCI bus for graphics data, AGP introduces a dedicated point-to-point channel so that the graphics controller can directly access main memory. The AGP channel is 32 bits wide and runs at 66 MHz. This translates into a total bandwidth of 266 MBps, as opposed to the PCI bandwidth of 133 MBps. AGP also supports three optional faster modes, with throughputs of 533 MBps (2×), 1.07 GBps (4×), and 2.14 GBps (8×). In addition, AGP further improves graphics performance by allowing graphics-related data and 3-D textures to be stored in main memory rather than video memory.
As the major hardware subsystems get faster, at different rates and move more data around, PCI and other currently used interconnects just cannot handle the load. Also, with the increasingly powerful and complex GPUs and better optimized and capable APIs, bus bandwidth limitations are again becoming a primary limitation to graphic system performance. Furthermore, many current and emerging tasks need faster processors, graphics, networking, and storage subsystems, and that translates into a need for much faster interconnects between those subsystems. Accordingly, new types of scalable bus standards, such as PCI Express (described in greater detail below), are being developed to address these limitations while preserving compatibility with existing components.
Despite the above-described innovations and other known advances for enabling improvements in computer graphic performance, there remains a continuous need for further improvements. For commercial viability, these improvements should use commonly available, off-the-shelf components. Furthermore, the improvements should not require extensive changes in hardware or software, so that the improved computer retains general compatibility with existing components and applications.
No known, commonly available computer currently uses two or more high performance graphics cards.