The need to improve the efficiency of graphics processing units (GPUs) running graphical applications has always been a concern of software developers. For example, bottleneck analysis is important to optimizing an application given that a pipelined GPU is only as fast as its slowest pipeline unit. Similarly, it is desirable to monitor the utilization of each unit of a graphics pipeline, thereby allowing load spreading and ensuring sufficient utilization of each unit at any given time. However, optimizing GPU performance is a daunting task given the limited number of performance tools available, and the limited number of features that the conventional tools offer.
In an attempt to isolate a GPU bottleneck, conventional performance tools require the user to run a series of tests intended to target a suspect unit of the graphics pipeline. The tests may involve varying characteristics of the pipeline while processing data, where a variation in frame rate may indicate a bottleneck in a certain area of the pipeline. However, such tests are time consuming, tedious and often produce misleading and/or vague results given the dependency of various pipeline units on one another. Moreover, modern GPUs employ multiple pipelines for increased processing power, thereby presenting additional dependencies which make isolating a single bottlenecking unit very difficult.
In addition to bottleneck analysis, software developers are also concerned with equalizing and increasing utilization of the pipeline units. However, conventional performance tools fail to provide such information directly. As such, even if a bottlenecking unit is identified, any correction to reduce the bottleneck may negatively impact the utilization of other units. Thus, developers often encounter minimal performance gains even if a bottleneck is corrected given the inability to monitor utilization information pertaining to pipeline units.
In addition to the inadequacies discussed above, conventional performance tools commonly display performance data in terms of averages and transient, scrolling data. As such, granular GPU analysis regarding unit bottleneck and utilization at the graphical frame level, and further at the graphical operation level, is essentially impossible. Moreover, even if a developer is able to remedy a bottleneck and increase the utilization of an under-utilized unit for a given frame or graphical operation, performance for other frames and/or graphical operations may decrease. Thus, much time and effort is likely to be spent using conventional performance tools with little or no appreciable increase in the performance of the graphical application on a given GPU.