The present invention relates to a method optimizing graphics processing in general, and in particular, a method of optimizing graphics processing for a multiprocessor system.
It is common for current computer systems to generate graphics images. This is specialized and computationally expensive procedure. Therefore, many computer systems utilize specialized hardware to perform varying parts of the graphics processing. FIG. 1 shows one example of such a current system. A main processor 100 is a general purpose processor. Geometry processor 102 is a specialized subprocessor within main processor 100 for performing the common transforms necessary to convert a three-dimensional image to a two-dimensional image while taking into account such issues as perspective. Such a geometry processor can be used for other common three-dimensional graphics calculations. Geometry processor 102 may also be separate from main processor 100, or can be executed as a software program on main processor 100. The final result from geometry processor 102 is a display list which contains information necessary for creating a graphic figure. Typically, the display list will contain the information for creating a polygon. A triangle is often a polygon, but other polygons are possible. The display list will contain the type of polygon represented, as well as the information necessary to generate that polygon. In this case, the triangle will be indicated, and information about each of the vertices of the triangle will also be included in the display list.
Bus 104 is a communications channel between geometry processor 102 and graphics processor 106. Graphics processor 106 is a specialized circuit for rendering display lists. Graphics processor 106 will include multiple subcircuits. Input buffer 108 buffers the display lists. Digital differential analyzer 112 (hereafter “DDA 112”) calculates the lines in between the vertices of the polygon. DDA 112 includes a DDA set-up engine. Pixel processor 114 performs multiple functions. Primary, it calculates the final RGB value of each pixel within the polygon. This step is referred to as rendering. Within the processing of rendering a polygon, pixel processor 114 will perform the step of texturing. Part of the determination of the RGB of a pixel will depend on a texture that has been chosen for that particular polygon. This texture, in the form of a texture map, is applied by the pixel processor 114 to the polygon. Frame buffer 116 is a dynamic random access memory, or DRAM, which accumulates the frame reconstructed out of polygons until an entire frame or field (in the case of double buffering) is generated. That frame is then passed through bus 118 to a digital to analog converter, and eventually, to a monitor. Frame buffer 116 will receive data from the pixel processor 114, but the pixel processor 114 also acts as frame buffer 116. A texture buffer 118 and a DRAM frame buffer 120 may be located either in the pixel processor 114, the frame buffer 116, or independent of either. The pixel processor will generally read and write to these buffers, and when necessary, these buffers are updated from the frame buffer 116.
Several problems, in terms of computational efficiency, using the above-described apparatus can occur when one is attempting to render an entire frame. For example, when rendering a scene from a game, the game will have a background of lesser detail, a midground of greater detail, and a foreground of greatest detail. The two extremes, the background and the foreground, create two different cases. In the first case, a small number of very large polygons are used to generate a background figure. In the second case, a large number of small polygons are used to generate a detailed foreground figure. Each of these two cases has different problems.
Processing the background of a frame in the first case requires processing a small number of large polygons. The amount of processing time required for a geometry transformation is dependent on the number of vertices to be processed. Therefore, the amount of processing time necessary for the geometry processor 102 to process the background polygons will be relatively short. However, the graphics processor 106 will have multiple problems with the large polygons. First, the geometry processor will issue display lists much faster than the graphics processor 106 will be able to render the large size polygons. Thus, the geometry processor will have to remain idle while the graphics processor 106 catches up. Other bottlenecks occur within the graphics processor 106 when processing large polygons besides the slower speed of rendering. A large polygon will cover a large area of a texture map. If this area is much larger than the size of the texture buffer 118, a “texture miss” will often occur. Pixel processor 114 will remain idle while texture buffer 118 is updated by the texture map within the frame buffer so that the appropriate textures can be applied to that portion of a polygon. Further, a large polygon will cover many DRAM pages. When the rendering of a polygon reaches the end of a DRAM page, a “DRAM page break” occurs. DRAM page buffer 120 must place its contents back within the frame buffer and the new page must be updated from the frame buffer to the DRAM page buffer. Pixel processor 114 will remain idle during this process. Such a page break will occur frequently with a large polygon, as the size of the polygon may be many times larger than the size of the DRAM page buffer, thus causing a DRAM page break multiple times for every rendering pass. Thus, a small number of large polygons have a number of sources of slowdowns in processing at the point of the graphics processor.
FIG. 2 shows the result of the graphics processor and geometry processor pipeline attempting to process a small number of large polygons. The graphics processor is able to process a polygon or other graphic object within a certain time at the available computational power, or less. The bar 202 shows the amount of computational time needed to process a background polygon or other object at the geometry processors' processing speed for the given task. Graphics processor 106 is represented by bar 204. Shaded bar 206 shows the amount of computational time needed to process the background polygon beyond the computational time available. FIG. 2 clearly shows that geometry processor 102 will remain idle while graphics processor 106 catches up in the amount of time shown by the shaded bar 206. Thus, the graphics processor will act as a bottleneck.
In the second, opposite case, a large number of small polygons need to be processed and rendered. This creates a different bottleneck. A large number of small polygons provides a much larger number of vertices. Thus, geometry processing will take substantially longer than in the case discussed above. FIG. 3 illustrates this. Bar 302 illustrates the amount of computational time needed to process a foreground polygon or other graphic object at the geometry processor's processing speed. Shaded bar 304 shows the amount of extra time the geometry processor will have to compute in order to “catch up” with the graphics processor. Graphics processor 106 will be able to render the foreground polygons much more quickly than in the first case. This is both because each polygon will have a smaller number of pixels to render, and that the small size of the polygons will decrease the likelihood of a DRAM page break or a texture miss. This increased processing speed is seen in bar 306 which illustrates the decreased amount of computational time the rendered of the large number of small polygons requires. Again, shaded bar 304 represents the amount of time the graphics processor 106 will have to remain idle while geometry processor 102 processing its computational background. Thus, in the second case, the geometry processor acts as a bottleneck.
The above problems are exacerbated by the continuing push to use higher resolution graphics such as qualified rendered pictures. Such pictures are drawn at such high resolution that the polygon lines are not visible and smooth curves result.
FIG. 4 shows a third case which is the computationally optimum case. In the case shown, geometry processor 102 is shown by bar 402 to process all of its display lists within a given time. Graphics processor 106 is shown by bar 404 to process and render all of its polygons in an equal amount of time. Thus, there is no bottleneck to the geometry processor-graphics processor pipeline. This is a load balanced condition.
New architectures of computer systems will include multiple geometry processors and multiple graphics processors. The advent of such apparatuses permits new ways of dealing with the above described problems and other problems arising from such multiprocessor systems.