Even before the beginning of the widespread use of personal computers, computer graphics has been one of the most promising, and most challenging, aspects of computing. The first graphics personal computers developed for mass markets relied on the main computer processing unit (“CPU”) to control every aspect of graphics output. Graphics boards, or video cards, in early systems acted as simple interfaces between the CPU and the display device, and did not conduct any processing of their own. In other words, these early video cards simply translated low level hardware commands issued by the CPU into analog signals which the display devices transformed into on-screen images. Because all of the processing was conducted by the CPU, graphics-intensive applications had a tendency to over-utilize processing cycles and prevent the CPU from performing other duties. This led to overall sluggishness and degraded system performance.
To offload the graphics workload from the CPU, hardware developers introduced video cards equipped with GPUs. GPUs are capable of accepting high level graphics commands and processing them internally into the video signals required by display devices. By way of an extremely simplistic example, if an application requires a triangle to be drawn on the screen, rather than requiring the CPU to instruct the video card where to draw individual pixels on the screen (i.e., low level hardware commands), the application could simply send a “draw triangle” command to the video card, along with certain parameters (such the location of the triangle's vertices), and the GPU could process such high level commands into a video signal. In this fashion, graphics processing previously performed by the CPU is now performed by the GPU. This innovation allows the CPU to handle non-graphics related duties more efficiently.
The primary drawback with early GPU-based video cards was that there was no set standard for the “language” of the various high level commands that the GPUs could interpret and then process. As a result, every application that sought to utilize the high level functions of a GPU-based video card required a specialized piece of software, commonly referred to as a “driver”, which could understand the GPU's “language.” With hundreds of different GPU-based video cards on the market, application developers became bogged down in writing these specialized drivers. In fact, it was not uncommon for a particularly popular software program to include hundreds, if not thousands, of video card drivers with its executable code. This, of course, greatly slowed the development and adoption of new software.
This “language” problem was resolved by the adoption, in modern computer operating systems, of standard methods of video card interfacing. Modem operating systems, such as the Windows® operating system (sold by Microsoft Corporation of Redmond, Wash.), require only a single hardware driver to be written for a video card. Interaction between the various software applications, the CPU and the video card is mediated by an intermediate software “layer” termed an Application Programming Interface (“API” or “API module”). All that is required is that the video drivers and the applications be able to interpret a common graphics API. The two most common graphics APIs in use in today's personal computers are DirectX®, distributed by Microsoft Corporation of Redmond, Wash., and OpenGL®, distributed by a consortium of computer hardware and software interests.
Since the advent of the GPU-based graphics processing subsystem, most efforts to increase the throughput of personal computer graphics subsystems (i.e., make the subsystem process information faster) have been geared, quite naturally, toward producing more powerful and complex GPUs, and optimizing and increasing the capabilities of their corresponding APIs.
Another way in which hardware developers have sought to increase the graphics subsystem throughput is by using multiple GPUs on a single video card to simultaneously process graphics information. An example of this technology is described in U.S. Pat. No. 6,473,086 to Morein et al. (the '086 patent). In the '086 patent, video command signals from APIs such as DirectX or OpenGL are processed by multiple GPUs, typically two, which are housed on a single video card. One GPU is designated as a “primary” GPU and the other as a “secondary” GPU. Although both GPUs independently process graphics commands that derive from an API, the secondary GPU must still route the information it processes (i.e., the digital representation for the portion of the screen assigned to it) through the primary GPU which, in turn, transfers a single, combined output video signal to the video display device. One obvious and significant drawback with this system, which is also prevalent in the single-video card systems discussed below, is that a high bandwidth pipeline must exist between the two GPUs.
Other attempts at multi-GPU computer graphics subsystems are described in U.S. Pat. No. 5,485,559 to Sakaibara, et al. (the '559 patent); U.S. Pat. No. 5,638,531 to Crump et al. (the '531 patent); U.S. Pat. No. 5,818,469 to Lawless et al. (the '469 patent); U.S. Pat. No. 5,841,444 to Mun et al. (the '444 patent); U.S. Pat. No. 6,008,821 to Bright et al. (the '821 patent); U.S. Pat. No. 6,157,393 to Potter et al. (the '393 patent); U.S. Pat. No. 6,384,833 to Benneau et al. (the '833 patent); and U.S. Pat. No. 6,529,198 to Miyauchi (the '198 patent).
The '559 patent describes a parallel graphics processor with graphics command distributor and command sequencing method, wherein a main processor sends to a command distribution device a series of graphic commands including an attribute command updating the state of the attribute which designates a display mode, and a primitive command defining graphics to be displayed. The command distribution device sequentially distributes the series of graphic commands to a plurality of geometry processors which process the graphics according to the type of command. The primitive command is sent to any one of plurality of geometry processors. At least those of the attribute commands which relate to the attributes of display used by the geometry processors are sent to all the geometry processors. The pixel commands comprising the outputs of those geometry processors are sent to a pixel processor which generates an image corresponding to the pixel commands. The pixel processor arranges pixel commands from the plurality of geometry processors on the basis of the data on the allocation of commands received from the command distribute means and then displays the graphic in the form of geometry data.
The '531 patent describes a multiprocessor integrated circuit with video refresh logic employing instruction/data caching and associated timing synchronization, wherein a digital data handling system handling display signal streams has a video processor which is capable of high performance due to vector processing and special addressing modes. The video processor is a single VLSI device having a plurality of processors, each of which has associated instruction and data caches, which are joined together by a wide data bus formed on the same substrate as the processors. Any graphics subsystem must be able to support a variety of video formats, causing video refresh logic to become complicated. Such potential complications are avoided by the provision of a simple, general purpose hardware refresh system based on the direct color graphics frame buffer, with the broader range of services being provided by emulation using one of the processors included in the device.
The '469 patent describes a method and implementing multiprocessor computer system in which graphics applications are executed in conjunction with a graphics interface to graphics hardware. A master thread, or master node in a distributed network system, receives commands from a graphics application and assembles the commands into workgroups with an associated workgroup control block and a synchronization tag. For each workgroup, the master thread flags changes in the associated workgroup control block. At the end of each workgroup, the master thread copies the changed attributes into the associated workgroup control block. The workgroup control blocks are scanned by the rendering threads, or rendering node in a distributed network system, and unprocessed workgroups are locked, and the rendering threads attribute state is updated from the previous workgroup control blocks. Once the rendering thread has updated its attributes, it has the necessary state to independently process the workgroup, thus allowing parallel execution. A synchronizer thread reorders the graphics datastream, created by the rendering threads, using the synchronization tags and sequentially sends the resultant data to the graphics hardware.
The '444 patent describes a multiprocessor graphics system having a pixel link architecture, includes: (1) a plurality of sub-graphics systems each of which assigned to each of a plurality of sub-screens provided by sectioning a display screen; and (2) a ring network for connecting the plurality of sub-graphics systems. Each of the sub-graphics systems includes a geometry engine, a raster engine, a local frame buffer and a pixel distributor. An interconnection network bottleneck between the raster engine and frame buffer is removed and a conventional memory system can be used by reducing the number of data transmissions between the raster engine and frame buffer while maintaining image parallelism and object parallelism.
The '821 patent describes a multiple embedded memory frame buffer system including a master graphics subsystem and a plurality of slave graphics subsystems. Each subsystem includes a frame buffer and a color palette for decompressing data in the frame buffer. The master subsystem further includes a digital to analog converter coupled to receive the decompressed digital data from the palette of each subsystem and outputting analog versions of the digital data to an output device. The system further includes a timing system for determining which outputs of the subsystems are to be converted by the digital to analog converter at a given time. A method of synchronization of embedded frame buffers for data transfer through a single output includes the steps of generating a first clock signal and a second clock signal in a master embedded frame buffer, sending the first and second clock signals to a slave embedded frame buffer and delaying the second clock signal to be in phase with a third clock signal generated by a graphics controller such that no data is lost when transferring data from the master and slave embedded frame buffers.
The '393 patent describes an apparatus for and method of directing graphical data toward a display device from a plurality of graphics processors by coupling the graphics processors in a manner that reduces the size of the interface on each graphics processor. In particular, each graphics processor produces graphical data for an associated set of pixels on the display device, where each pixel is represented by a first amount of graphical data. The graphics processors are arranged so that one of the graphics processors is a destination processor. The total number of graphics processors that are not designated as the destination processor thus constitute a remaining number. Each graphics processor produces a second amount of graphical data during each clock cycle of a common clock. The first amount of graphical data, however, is comprised of at least substantially two times the second amount of graphical data. The graphics processors then are coupled so that during each clock cycle, the destination processor receives no more graphical data from the other processors than an amount equal to the product of the remaining number and the second amount.
The '833 patent describes a method and parallelizing geometric processing in a graphics rendering pipeline, wherein the geometric processing of an ordered sequence of graphics commands is distributed over a set of processors by the following steps. The sequence of graphics commands is partitioned into an ordered set of N subsequences S.sub.0 . . . S.sub.N−1and an ordered set of N state vectors V.sub.0 . . . V.sub.N−1 is associated with said ordered set of subsequences S.sub.0 . . . S.sub.N−1. A first phase of processing is performed on the set of processors whereby, for each given subsequence S.sub.j in the set of subsequences S.sub.0. . . S.sub.N−2state vector V.sub.j+1 is updated to represent state as if the graphics commands in subsequence S.sub.j had been executed in sequential order. A second phase of the processing is performed whereby the components of each given state vector V.sub.k in the set of state vectors V.sub.1 . . . V.sub.N−1 generated in the first phase is merged with corresponding components in the preceding state vectors V.sub.0 . . . V.sub.k−1 such that the state vector V.sub.k represents state as if the graphics commands in subsequences S.sub.0 . . . S.sub.k−1 had been executed in sequential order. Finally, a third phase of processing is performed on the set of processors whereby, for each subsequence S.sub.m in the set of subsequences S.sub.1 . . . S.sub.N−1geometry operations for subsequence S.sub.m are performed using the state vector V.sub.m generated in the second phase. In addition, in the third phase, geometry operations for subsequence S.sub.0 are performed using the state vector V.sub.0.
The '198 patent describes a parallel rendering device, wherein a rendering command/data generator distributes rendering commands and data to each of rendering devices with rendering commands and data for one screen as a unit. Each of rendering devices carries out generating of display data and storing of the display data in a rendering memory incorporated in each rendering device in accordance with the rendering commands and data. The content of the rendering memories is read out by a read signal that: is supplied from display control unit and synchronized with the scan of display. A window number buffer issues the window number of the window in which a pixel currently to be displayed is included. A window number/rendering device management table issues the device number of the rendering device as a selection signal. A display switch selects the rendering device of the device number indicated by the selection signal to connect the rendering device to the display. In this way, the most recent display data of the window of the above-described window number is supplied to the display.
Yet other attempts at multi-GPU computer graphics subsystems and/or related inventions are described in U.S. Pat. No. 5,473,750 to Hattori; U.S. Pat. No. 5,560,034 to Goldstein; U.S. Pat. No. 5,774,133 to Neave et al.; U.S. Pat. No. 5,790,842 to Charles et al.; U.S. Pat. No. 5,923,339 to Date et al.; U.S. Pat. No. 5,986,697 to Cahill, III; and U.S. Pat. No. 6,329,996 to Bowen et al.
None of the devices, systems or methods mentioned above describes a graphics processing subsystem for use in a computer that combines the processing power of multiple, off-the-shelf video cards, each one having one or more GPUs, and assigns each video card to process instructions for drawing a predetermined portion of the screen which is displayed to the user through a monitor or other visual output device. In addition, none of the above devices describes a graphics processing subsystem capable of combining multiple, off-the-shelf video cards without substantial modification to the video cards.
Therefore, there is a need in the prior art for a graphics processing subsystem for use in a computer that combines the processing power of multiple video cards, each one having one or more GPUs, and assigns each video card to process instructions for drawing a predetermined portion of the screen which is displayed to the user through a monitor or other visual output device.
There is a further need in the prior art for a graphics processing subsystem capable of combining multiple, off-the-shelf video cards without substantial modification to the video cards.
There is a further need in the prior art for a graphics processing subsystem that can combine the processing power of multiple video cards and which does not require a high bandwidth connection between the video cards.