The need to improve the efficiency of graphics processing units (GPUs) executing graphical applications has always been a primary concern of software developers. For example, bottleneck analysis is important to optimizing an application given that a pipelined GPU is only as fast as its slowest pipeline unit. Similarly, it is desirable to monitor the utilization of each unit of a graphics pipeline, thereby allowing load spreading and ensuring sufficient utilization of each unit at any given time. However, optimizing GPU performance and debugging a graphics pipeline subunit is a daunting task given the limited number of performance tools available, and the limited number of features that the conventional tools offer.
Conventional methods fail to provide detailed information for the setup of each subunit of the graphics pipeline and their state information when executing a graphical operation (e.g., a draw call). Accordingly, corrective actions are made through trial and error without a true knowledge of the setup of each subunit and their corresponding state information for a draw call. Moreover, any correction to remedy the problematic subunit of the GPU may negatively impact other subunits of the GPU. For example, reducing the bottleneck of one subunit may negatively impact the utilization of other subunits. Thus, developers often encounter minimal performance gains even if a bottleneck is corrected given the inability to monitor utilization information pertaining to pipeline subunits.
In addition to the inadequacies discussed above, even if a developer is able to remedy a bottleneck and increase the utilization of an under-utilized subunit for a given frame or graphical operation, performance for other frames and/or graphical operations may decrease. Thus, much time and effort is likely to be spent using conventional performance tools with little or no appreciable increase in the performance of the graphical application on a given GPU.