A typical modern computer system may include a multi-core processor, which includes one or more processing cores that execute a large number of threads. A relatively complex software stack may be running on the multi-core processor, and as such, a given software function, such as a draw call, may not be executed serially across the set of processing cores, but instead, the function typically is divided into a multitude of tasks, which are executed across many threads on the processing cores. Additionally, when a specific task executes typically is not deterministic in nature, as any single task may be theoretically preempted by the scheduler, removed from execution mid-stream and rescheduled at a later time on another thread and/or core.
Given these complexities, it may be challenging for an analysis program to determine which monitored performance metrics, such as cache misses, execution stalls, etc., are attributable to a specific task, group of tasks or software function.