The invention relates generally to video graphics processors and more particularly to method and apparatus for providing synchronization between multiple graphics processors.
Video graphic processors (VGPs) are used to render video signals to be displayed on display devices such as computer monitors. In operation, VGPs will generally receive graphics information from a system, such as a computer system, and perform the necessary graphics calculations upon the received information in order to render graphics signals. Graphics calculations are performed for many different types of information, including lighting information, user view information, texture information, and Z-plane data information, which indicates the position order of one object relative to another object. Once all calculations have been performed upon an object, the data representing the object to be displayed is written into a frame buffer. Once the graphics calculations have been repeated for all objects associated with a specific frame, the data stored within the frame buffer is rendered to create a video signal that is provided to the display device.
The amount of time taken for an entire frame of information to be calculated and provided to the frame buffer becomes a bottleneck in a video graphics system as the calculations associated with the graphics become more complicated or the total area of the objects drawn become very high. Contributing to the increased complexity of the graphics calculations is the increased need for higher resolution video, as well as the need for more complicated video, such as 3-D video or stereoscopic video. The video image observed by the human eye becomes distorted or choppy when the amount of time taken to provide an entire frame of video exceeds the amount of time which the display must be refreshed with a new graphic, or new frame, in order to avoid perception by the human eye.
The use of multiple graphic adapters has been proposed in order to provide data to the frame buffer at a rate fast enough to avoid detection by the human eye. Current methods of using multiple graphics devices have partitioned the graphics associated with each such that each one of the multiple processors is responsible for rendering a portion of each frame. Each processor renders a portion of a frame in order to assure data is provided to the frame buffer within a required amount of time.
Once such partitioning method split the screen into odd and even display lines, whereby one video adapter would render all of the odd lines associated with a specific frame, while the second device would render all of the even lines associated with the frame. Another prior art method split the screen into two discrete areas, such as a top and a bottom half, whereby each display device would be responsible for rendering one portion of the screen. However, problems with these implementations occur.
One problem with present implementations is that all of the video data from the system needs to be sent to both of the data graphics devices. For example, in the implementation where the graphics device split the odd and even lines it is necessary for each video device to receive the object""s video information from the system. The amount of data sent by the system to the graphics adapters in effect doubles, because each graphics adapter needs all the information. In an implementation where the data is be sent to both devices at the same time, there is hardware and/or software overhead associated with controlling the reception of the data.
Workload distribution is another problem associated with known graphics systems having multiple adapters. When each of the two graphics devices is processing a portion of a single frame, a likelihood exists that the amount of work to be done by one of the processors for a given frame will be significantly greater than the amount of work being done by the other video device. For example, where a first video device is to render the video for the top half of the screen, it is likely that it will have fewer calculations to perform than the device calculating the graphics for the bottom half of the frame. One reason for this disparity in workload distribution is because it is common for the top half of a frame to contain skyscape information which is less computationally intensive than for the objects associated with action video often found on the bottom half of a display device or frame. When the workload distribution is not even, one graphics device will in effect end up stalling while the second graphics device completes its calculations. This workload balancing problem yields inefficient use of total rendering capabilities of multiple chips.
Yet another problem associated with the prior embodiments is that each of the graphics devices has to calculate the shape of each and every object on the frame. Each device must calculate each object""s shape in order to determine whether or not the object, or a portion of the object, must be further processed by the graphics engine associated with the graphics device. An associated problem, is that when an object straddles the demarcation line between an area that the first graphics device is to process and an area that the second graphics device is to process it is necessary for both devices to process the object. For example, when a portion of an object is in the top half of the screen, and a portion on the bottom half of the screen, calculations associated with the object are calculated by both graphic devices.
Multi-graphic processor devices, each having 3D rendering engines, typically employ page flipping as known in the art. For example, each video processing device may process alternate frames that are being displayed on one or more display devices. Page flipping occurs within a graphics processor and between multiple graphics processors. Accordingly, buffers are designated as a front buffer and a back buffer. The front buffer typically contains a frame currently being displayed whereas the back buffer contains frames that are receiving rendered data. Accordingly, when parallel processing is performed by multiple video graphics processors, output from one processor, whether it is a line, frame or partial frame one graphics processor is flipped in favor of the output from the other processor. A page flip is typically carried out during a vertical synchronization pulse. However, a problem can arise when two rendering engines are each rendering alternate frames of different data, one rendering engine may complete rendering before the other although the other is supposed to provide the frame that is displayed next. For example, with a 100 Hertz refresh rate, a flip may occur every {fraction (1/100)}th of a second. Since rendering can be variable based on the complexity of the frame, there may not always be a frame available from the other graphics processor when needed so the system must be display the current frame again from the other chip. This can cause great inefficiency in a multi-processor environment.
Moreover, typically, the host processor writes the rendering commands and page flip commands to a command queue processor for each graphics processor. Also, programmable array logic (PAL) is operatively coupled to receive output from each chip in the order required for display. The programmable array logic typically selects which frame to send to a display device as received from a frame buffer associated with each of the multiple graphics processors. Accordingly, the last command in a command queue indicates that the frame is rendered and that a flip can occur. As known in the art, the host processor typically queues the commands for the rendering engines in a command FIFO queue for each frame based on commands from, for example, a 3D video game or other source using a 3d rendering engine.
Because the queue based flip command is queued with the rendering commands, each chip or graphics processor does not typically know when the other processor is finished. Accordingly, the frame that is supposed to be displayed subsequent to a current frame may be completely rendered before a current frame is rendered. One solution has been proposed to have the host processor stall and wait until it receives notification based on when it detects which frame has been displayed. However, this typically requires that the host processor needs to poll each processing chip to determine when or what stage it is in. This can drastically slow down performance of a system. It is important to efficiently operate the host processor and the rendering engines.
It is also known to divide a frame either horizontally or vertically and have one graphics processor provide, for example, one half of the frame and have the other processor provide the other half of the frame. For example, one processor can provide the rendered lines for the first half of the horizontal lines for a frame and the other graphics processor can provide the other half of the rendered lines for the same frame. Alternatively, a portion of each horizontal line can be provided by an alternate graphics processor if desired. Further, the host processor still typically includes a flip page command in the command queue for both processing devices. If each graphics processing device has a variable render time, a host processor stall typically must be employed. Accordingly, with variable rendering times, synchronization may not be provided.
Therefore, it would be desirable to have a method and apparatus that allows the use of multiple video graphics devices that overcome the problems associated with the prior art.