The three dimensional graphic pipeline architecture breaks-down into segmented stages of CPU, Bus, GPU vertex processing and GPU fragment (pixel) processing. A given pipeline is only as strong as the weakest link of one of the above stages, thus the main bottleneck determines the overall throughput. Enhancing performance is all that is required for reducing or eliminating bottlenecks. The major bottleneck strongly depends on the application. Extreme cases are CAD-like (Computer Aided Design) applications, characterized by an abundance of polygons (vertices), vs. video-game applications having a small polygon count but intensive fragment activity (e.g., texturing). The first class suffers from vertex processing bottlenecks, while the second class suffers from fragment bottlenecks. Both are frequently jammed over the PC bus. Many applications have mixed characteristics, where bottlenecks may randomly alternate between extremes, on a single frame basis.
The only way to improve the performance of the GPU is by means of parallelizing multiple GPUs according to one of the bottleneck solving methods. There are two predominant methods for rendering graphic data with multiple GPUs. These methods include time division (time domain composition), in which each GPU renders the next successive frame, and image division (screen space composition), in which each GPU renders a subset of the pixels of each frame. The third one, much less popular, is the object division (polygon decomposition) method.
In the time division method each GPU renders the next successive frame. It has the disadvantage of having each GPU render an entire frame. Thus, the speed at which each frame is rendered is limited to the rendering rate of a single GPU. While multiple GPUs enable a higher frame rate, a delay can be imparted in the response time (latency) of the system to a user's input. This occurs because, while at any given time, only one GPU is engaged in displaying a rendered frame, each of the GPUs is in the process of rendering one of a series of frames in a sequence. To maintain the high frame rate, the system delays the user's input until the specific GPU, which first received the signal cycles through the sequence, is again engaged in displaying its rendered frame. In practical applications, this condition serves to limit the number of GPUs that are used in a system. With large data sets, there is another bottleneck, due to the fact that each GPU must be able to access all the data. This requires either maintaining multiple copy operations of large data sets or having possible conflicts in accessing the single copy operation.
The image division method splits the screen between N GPUs, such that each one displays 1/N of the image. The entire polygon set is transferred to each GPU for processing, however, the pixel processing is significantly reduced to the window size. Image division has no latency issues, but it has a similar bottleneck with large data sets, since each GPU must examine the entire database to determine which graphic elements fall within the portion of the screen allocated to said GPU. The image division method suits applications with intensive pixel processing.
The object division method is based on distribution of data subsets between multiple GPUs. The data subsets are rendered in the GPU pipeline, and converted to Frame Buffer (FB) of fragments (sub-image pixels). The multiple FB's sub-images have to be merged (composited) to generate the final image to be displayed. Object division delivers parallel rendering on the level of a single frame of very complex data consisting of large amount of polygons. The input data is decomposed in the polygon level and re-composed in the pixel level. A proprietary driver intelligently distributes data streams, which are generated by the application, between all GPUs. The rasters, generated by the GPUs, are composited into final raster, and moved to the display. The object division method well suits applications that need to render a vast amount of geometrical data. Typically, these are CAD, Digital Content Creation, and comparable visual simulation applications, considered as “viewers”, meaning that the data has been pre-designed such that their three-dimensional positions in space are not under the interactive control of the user. However, the user does have interactive control over the viewer's position, the direction of view, and the scale of the graphic data. The user also may have control over the selection of a subset of the data and the method by which it is rendered. This includes manipulating the effects of image lighting, coloration, transparency and other visual characteristics of the underlying data.
In above applications, the data tends to be very complex, as it usually consists of massive amount of geometrical entities at the display list or vertex array. Therefore, the construction time of a single frame tends to be very long (e.g., typically 0.5 sec for 20 million polygons), which in turn slows down the overall system performance.
Therefore, there is a need to provide a system which can guarantee the best system performance, while being exposed to high traffic over the PC (Personal Computer) Bus.
It is an object of the present invention to provide an amplified strength of the GPU by means of parallelizing multiple GPUs.
It is another object of the present invention to provide a system, wherein the construction time of a single frame does not slow down the overall system response.
It is still another object of the present invention to provide a system and method, wherein the graphic pipeline bottlenecks of vertex processing and fragment processing are transparently and intelligently resolved.
It is still a further object of the present invention to provide a system and method that has high scalability and unlimited scene complexity.
It is still a further object of the present invention to provide a process overcoming difficulties that are imposed by the data decomposition, which is partition of data and graphic commands between GPUs.
It is still a further object of the present invention to provide a method and system for an intelligent decomposition of data and graphic commands, preserving the basic features of graphic libraries as state machines and complying with graphic standards.
Other objects and advantages of the invention will become apparent as the description proceeds.