1. Field of the Invention
The present invention relates generally to state machine performance optimization and, more particularly, to the assessment of performance optimizations in a graphics system environment.
2. Related Art
Computer graphics systems are commonly used for displaying two- and three-dimensional graphics representations of objects on a two-dimensional video display screen. Current computer graphics systems provide highly detailed representations and are used in a variety of applications.
In a typical computer graphics system an object or model to be represented on the display screen is broken down into graphics primitives. Primitives are basic components of a graphics display and may include, for example, points, lines, triangles, quadrilaterals, triangle strips and polygons. Typically, a hardware/software scheme is implemented to render, or draw, the graphics primitives that represent a view of one or more objects being represented on the display screen.
Generally, the primitives of the three-dimensional object to be rendered are defined by a host computer in terms of primitive data. For example, when the primitive is a triangle, the host computer may define the primitive in terms of the X, Y, Z and W coordinates of its vertices, as well as the red, green and blue and alpha (R, G, B and xcex1xe2x80x3) color values of each vertex. Additional primitive data may be used in specific applications. Rendering hardware interpolates the primitive data to compute the display screen pixels that represent each primitive, and the R, G and B color values for each pixel.
A graphics software interface is typically provided to enable graphics applications located on the host computer to efficiently control the graphics system. The graphics interface provides specific commands that are used by a graphics application to specify objects and operations to produce an interactive, three-dimensional graphics environment. Such a graphics interface is typically implemented with software drivers.
Graphics systems generally behave as a state machine in that a specified state value remains in effect until it is changed by the graphics application through the issuance of a command, referred to herein as a graphics call, to the graphics system through the graphics interface. Thus, all vertices are rendered in accordance with a current value for each state variable in the graphics system.
By providing detailed control over the manner in which primitives and their vertices are rendered in the graphics system, the graphics software interface provides software developers with considerable flexibility in creating graphics application software programs. For example, a graphics software application may be structured in any one of many different configurations to implement a desired function or to achieve a desired result in the computer graphics system.
However, some sequences of graphics calls and the manner in which primitives, vertices and states are implemented are more efficient than others. That is, although multiple graphics applications may achieve the rendering of the same object or model on a display screen, certain graphics applications may cause the graphics system to perform unnecessary operations, or perform certain operations in a manner that requires greater overhead than perhaps is otherwise necessary.
There has been some attempt in the past to assess the performance of the graphics system and the efficiency with which the graphics system is controlled and otherwise utilized by the graphics software application. This includes identifying potential areas that may be improved and the assessment of attempted performance optimizations to the graphics application or graphics system.
Traditional graphics optimization techniques have generally required the prototyping of potential improvements in the graphics applications. Typically, the application developer first identifies areas in which the graphics system is controlled and utilized. Based upon perceived inefficiencies in the graphics system, the graphics application is modified to implement particular functions differently and to achieve a desired effect in the graphics system more efficiently. The modified graphics application is recompiled, debugged and then tested to determine whether the intended performance benefits have been realized.
This process of prototyping potential performance optimizations is time consuming and expensive. As a result, it is not uncommon for potential optimizations to not be fully explored to avoid such time and expense. In addition, the time consuming process may not necessarily yield the anticipated results, increasing the cost of those improvements which are implemented.
Furthermore, since one or more optimizations are implemented at the graphics application level, the factors contributing to changes in system performance may be masked. That is, because the full impact throughout the graphics application and graphics system of each attempted optimization is typically unknown, the contributing factors to any performance increase resulting from the implementation of such optimizations is also unknown. As a result, it is difficult or not possible for the application developer to select those optimizations which are cost effective and should be permanently implemented. In addition, for the same reasons, optimizations yielding minimal performance benefits are often implemented unnecessarily.
What is needed, therefore, is a system and method that allows for the efficient and accurate assessment of performance optimizations in graphics environments without having to prototype potential optimizations in the graphics application.
The present invention overcomes the above and other drawbacks to conventional performance optimization identification and assessment techniques. The present invention includes both methods and apparatus for evaluating and optimizing a graphics call sequence generated by a graphics application in accordance with a particular graphics application programing interface (API). In one embodiment, the graphics API is the OpenGL(copyright) API. The resulting optimized graphics call sequence causes the same graphics rendering to occur when provided to the graphics system as the original graphics call sequence. As such, the graphics application and associated graphics interface driver may then be analyzed by the application developer to identify specific modifications which, when implemented, would generate such an optimized graphics call sequence. This may include implementing specific modifications to the graphics application as well as implementing portions or all of the present invention into the associated driver for real-time execution.
Furthermore, the present invention enables the application developer to easily and non-invasively assess graphics performance optimizations made to the graphics application or associated driver based upon the optimized graphics call sequence. Significantly, the present invention allows for consideration of the changes in graphics system performance which would result from implementation of such performance optimizations. Only the desired performance optimizations may then be implemented, avoiding the inefficient process of prototyping potential performance optimizations without a full appreciation of the resulting changes in graphics system performance which will be caused by the implementation of such optimizations.
A further advantage of the present invention is that it can be used to identify usages of the graphics interface commonly invoked by the graphics application. This information can then be used by a graphics system vendor to optimize the underlying software or hardware in the graphics system to improve the manner in which the graphics system operates in response to such common graphics interface usages.
In one aspect of the invention, a performance optimization assessment system is disclosed. The system includes one or more graphics call sequence optimizers that operate on a captured graphics call sequence. The optimizers include a plurality of different optimizers, one or more of which may operate on all or a portion of the captured graphics call sequence to generate the optimized graphics call sequence. The optimizers may be performed in any desired sequence, utilizing the results of any other optimizer. In this manner the performance benefits of each type of optimizer may be individually evaluated and, if desired, combined with the results of other optimizers. In operation, a graphics call sequence capture mechanism captures a sequence of graphics calls.
In one embodiment, the optimizers includes a graphics state call coalescer. The graphics state call coalescer determines whether a given continuous series of graphics state calls in the captured graphics call sequence contains redundant, conflicting or otherwise unnecessary graphics state calls and, if so, eliminates the unnecessary graphics state calls so as to reduce the total number of graphics state calls in that continuous series. Generation of such an optimized series of graphics state calls by the graphics application would reduce the number of state changes invoked in the graphics system, thereby reducing the overhead associated with the particular continuous series of graphics state calls.
In another embodiment, the optimizers include a graphics primitive coalescer. The graphics primitive coalescer coalesces the graphics vertex calls contained in a continuous series of primitive command sets that render primitives of the same type. The graphics primitive calls that comprise a primitive command set include the glBegin( )/glEnd( ) graphics call pair as well as the intervening graphics vertex calls. Other graphics calls such as graphics attribute calls may exist in the vertex-data list between the glBegin( )/glEnd( ) pair and are not of concern to the graphics primitive coalescer and are simply passed through with their associated graphics vertex calls. When such a continuous series of primitive command sets is identified, the glBegin( )/glEnd( ) pairs are removed, coalescing the graphics vertex calls between a single glBegin( )/glEnd( ) pair with a single vertex-data list comprising all of the graphics vertex calls in the original continuous series of graphics primitive calls.
As noted above, certain primitives may be rendered in a strip form. These include line, triangle and quadrilateral strips. As described above, each discrete primitive in each type of strip shares at least one common vertex with a neighboring discrete primitive of the strip. Rendering such common vertices twice requires unnecessary duplicative operations to be performed by the graphics system. In accordance with one embodiment of the primitive coalescing process of the present invention, the graphics primitive calls may be further coalesced to eliminate the graphics vertex calls rendering redundant vertices if the proper number of common vertices exist between neighboring discrete primitives. Such a modification would necessarily include the altering the type of the glBegin( )/glEnd( ) pair to specify a primitive strip rather discrete primitive. Thus, the primitive coalescing optimizer optimizes the series of graphics primitive calls by eliminating the redundant processing associated with the duplicative glBegin( )/glEnd( ) pairs and duplicative vertices.
In another embodiment, the optimizers include a common state primitive compiler. The common state primitive compiler remedies frequent toggling of state values by the graphics application. Oftentimes, state values will be set to particular values to render a number of primitives, followed by the setting of the same state to other values to render additional primitives. Subsequently, the same state is again changed to the original values to render still additional primitives. Such toggling of state values results in duplicative graphics state calls being implemented by the graphics system. A continuous series of primitive command sets that are to be rendered with the same state values appears in the captured graphics call sequence with one or more graphics state calls preceding them. A compiled list of primitive command sets preceded by the graphics state calls necessary to establish the common state values are output as an optimized graphics call sequence. Generation of such an optimized graphics call sequence will cause all requisite graphics state calls for a series of primitives to be issued once prior to the rendering of that series of primitives. This significantly reduces the toggling of graphics states and the processing overhead associated with such state changes.
In another embodiment, the optimizers include a primitive type converter. The primitive type converter converts the primitive type specified in a primitive command set from a non-combinable primitive type to a combinable primitive type. Non-combinable primitive types are those types of primitives that cannot be coalesced by the graphics primitive coalescer. The specified primitive types which are non-combinable are a function of the implemented graphics interface. However, it is not uncommon for graphics applications to generate a primitive command set that specifies a non-combinable primitive type when the vertices contained within the primitive command set could also be typed as a combinable primitive type, if the appropriate number of graphics vertex calls are specified.
In another embodiment, the optimizers include a vertex array builder. The vertex array builder creates a vertex array having vertices identified in a series of graphics vertex calls of a primitive command set, and generates the associated offset for reference by a graphics array call that uses the array of vertices to render a number of specified primitives. Establishing such an array and providing such graphics calls reduces the number of graphics calls that are sent to the graphics system to render a desired series of graphics primitives. Furthermore, such an approach utilizes available high performance array processing capabilities provided by some implementations the OpenGL API, reducing the processing time that would have been incurred in processing a long series of graphics primitive calls.
In another embodiment, the optimizers include a vertex loop generator. The vertex loop generator is applied to any repetitive series of graphics calls occurring in a primitive command set. The vertex loop generator generates a repeatable loop to efficiently process repetitive series of graphics calls. This optimization is typically implemented after all other previously described optimizations have been implemented. This optimization has the advantage of converting graphics calls occurring in a primitive command set into a form that, because of cache behavior and other factors, yields performance results similar to the graphics application. Modeling cache behavior employed in present day computing systems enhances the value of the performance characteristic analysis made between the original graphics call sequence and the optimized graphics call sequence.
In another aspect of the present invention, an apparatus for optimizing a graphics call sequence generated by a graphics application in accordance with a particular graphics API is disclosed. The apparatus may be implemented either in graphics hardware or a driver comprising the graphics system. Alternatively, the apparatus may be included in a driver associated with the graphics application. This provides the significant advance of benefiting from the above optimizations during real-time operations of the graphics environment.
In another aspect of the invention, a graphics environment is disclosed. The graphics environment includes a graphics application communicating with a graphics system through a first driver interfaced directly with the graphics application and a second driver interfaced directly with the graphics system, the drivers communicating with each other using a predetermined graphics interface. Implemented in the graphics environment is a system for identification and assessment of performance optimizations implemented in the graphics environment, said identification and assessment of said performance optimizations based upon an optimized graphics call sequence generated by an application of one or more optimizations applied to a captured graphics call sequence occurring between said first and second drivers.