1. Field of the Invention
The present invention generally relates to graphics chip performance analysis and, more specifically, to triggering performance event capture via pipelined state bundles.
2. Description of the Related Art
Graphics processing hardware typically includes circuitry known as a graphics processing pipeline. The graphics processing pipeline may be divided into stages. Information, including graphical data and instructions, passes through the graphics processing pipeline from the earliest stage to the latest stage, and each stage processes the information by performing various associated functions and operations on the information. Stages can operate independently from one another, which enables the different stages to process different information at the same time. Processing different information simultaneously in this manner increases utilization of the graphics processing pipeline, which improves performance.
A graphics processing pipeline may be configured to provide “performance data” related to how efficiently graphics data is being processed in the graphics processing pipeline. Such performance data may include the time required to complete a specific task, the amount of data processed during a certain period of time, and other like measurements. Typically, performance data is obtained regarding a particular unit of work. Such a unit of work may be an “instance,” which, as used herein, refers to a set of related graphics processing pipeline methods (such as draw calls) that are all performed on a certain rendering “state”.
With some conventional graphics processing pipeline architectures, performance data may be analyzed instance-by-instance. An instruction to begin monitoring performance may be sent into the graphics processing pipeline, along with an instance for which performance data is desired. The time at which the instance is sent into the graphics processing pipeline and the time at which the results of the processed instance exit the graphics processing pipeline may be used to determine how much time is required to process the instance. In addition, counters for counting specific items of work completed when processing the instance, such as the number of pixels shaded, and the like, may also be recorded.
However, with some graphics processing pipeline architectures, the graphics processing pipeline may include a unit, known as a tiler unit, at an intermediate position in the graphics processing pipeline. The tiler unit is configured to receive graphical data from a first portion of the graphics processing pipeline and organize the graphical data into mutually exclusive constructs known as “tiles.” To do this, a tiler unit may accumulate graphical data and instructions from several instances that are sent into the graphics processing pipeline, subdivide and interleave the instances together to produce a combined workload, and send the combined workload into a second portion of the graphics processing pipeline. The second portion of the graphics processing pipeline processes data tile-by-tile, rather than instance-by-instance.
The presence of a tiler unit may make conventional performance monitoring, as described above, near impossible because the work being processed in the graphics processing pipeline downstream of the tiler unit is tile-based, not instance-based. In other words, because downstream stages in the graphics processing pipeline are processing a given tile at a time, as opposed to a particular instance, timestamps for when a tile enters and exits a given stage of the graphics processing pipeline may provide a total time related to processing several instances, as opposed to providing information about the time required to process a specific instance. Similarly, counters associated with a downstream stage of the graphics processing pipeline would provide counts related to processing several instances, as opposed to providing counts related to processing a specific instance. Without information related to how specific instances are processed in the graphics processing pipeline, debugging the graphics processing pipeline and analyzing ways to improve graphics processing pipeline performance are much more difficult.
As the foregoing illustrates, what is needed in the art is a way to obtain performance data related to graphics processing pipeline workloads in a tile-based system.