In recent years, processing subunits that are in general pipelined have substantially increased in complexity. As a result, the increase in complexity has led to an increase in difficulty in determining performance characteristics of a processor. Therefore, achieving optimal performance out of a processor has become a daunting task. Since a processor pipeline runs only as fast as its slowest processing subunit, it is important to identify and address the slower stages of the pipeline in order to improve the efficiency of the processor.
In general, details of an application being executed on a processor and its performance are hidden from the user in order to protect the manufacturers' proprietary information. On the other hand, providing detailed information of an application executing on a processor and its performance allows software developers to improve the efficiency of their software running on such a processor. Accordingly, there is a tradeoff between protecting the proprietary information of the manufacturer of the processor and improving the performance of the software running on the processor.
Moreover, increasing need for efficiency has also led to development of multi-functional processing subunits (e.g., unified processing subunits). Accordingly, unified processing subunits are not dedicated to process one type of data, but may process a variety of data types. For example, in a graphical processing unit (GPU), a unified shader processing subunit may process vertex information, transform and generate new triangles, determine corresponding pixels for the generated triangles and compute color, lighting and alpha value for these pixels. Accordingly, a multifunctional processing unit further complicates determining the performance parameters for the processing unit since different types of data are being processed by the processing unit.
Unfortunately, it is a difficult task to isolate a particular processing unit within the GPU pipeline by merely varying the workload. Moreover, there is a tradeoff between exposing internal GPU information/performance data and improving the frame rate. Furthermore, isolating a particular processing unit (e.g., multi-functional processing subunit) fails to provide the performance parameters for individual data types being processed by a unified processing subunit. For example, isolating a unified shader processing subunit in a GPU pipeline and calculating performance parameters (e.g., bottleneck and utilization) for a unified shader subunit in a GPU pipeline provides performance parameters for a unified shader subunit as a whole. As such, calculating performance parameters fails to provide performance parameter information for the type of data processed by the unified shader processing unit (e.g., vertex, geometry, and pixel).
Accordingly, software developers are unaware of the type of data causing bottleneck in a unified processing subunit. Similarly, software developers are unaware of data types which maximize utilization of unified processing subunit. Moreover, software developers are unaware of the type of data that are computationally intensive (e.g., vertex intensive, pixel intensive or geometrically intensive data). In other words, software developers are unaware of component workloads in a unified processing subunit. Accordingly, it is difficult to improve the performance of a unified processing subunit without visualizing component workloads in a unified processing subunit.