Modern general purpose and graphics processors may include one or more processing cores, and these processing cores may run a relatively large number of threads. Therefore, analyzing the performance of a multi-core processor may involve a complex undertaking given the number of tasks and the number of different threads that may be running.
Analyzing the performance of certain software may involve capturing a buffer of what each thread does in the process and using analysis tools to generate reports and visualizations of what occurred in the application. Challenges arise in comparing data collected across different application sessions, called “differencing.”
More specifically, conventionally, in a serially-executed application, differencing is relatively straightforward because the relative sequence of function calls or tasks is usually deterministic. As a result, a conventional differencing algorithm may scan the list of records in the file to do a relatively quick correspondence between records. However, in parallel-executed applications, the assignment of tasks to threads is rarely deterministic. Similarly, when a task executes on a given thread is equally nondeterministic. As a result, even in two runs of an application that were passed the exact same input, it is relatively hard to determine one-to-one correspondences between individual tasks.