Computer applications having concurrent threads executed on multiple processors present great promise for increased performance but also present great challenges to developers. The growth of raw sequential processing power has flattened as processor manufacturers have reached roadblocks in providing significant increases to processor clock frequency. Processors continue to evolve, but the current focus for improving processor power is to provide multiple processor cores on a single die to increase processor throughput. Sequential applications, which have previously benefited from increased clock speed, obtain significantly less scaling as the number of processor cores increase. In order to take advantage of multiple core systems, concurrent (or parallel) applications are written to include concurrent threads distributed over the cores. Parallelizing applications, however, is challenging in that many common tools, techniques, programming languages, frameworks, and even the developers themselves, are adapted to create sequential programs.
Optimizing parallel performance can be time consuming, difficult, and error-prone because there are so many independent factors to track. One set of factors involves scheduling priorities and how thread mapping affects system performance. A scheduler controls multitasking with scheduling priorities algorithms to determine how threads receive processor time slices. At times, a thread executing on one processor core can be stopped, moved to another core, and continued. Each thread has access to memory in order to load instructions to execute, to load saved data to read, or save produced data to write. Data and instructions are usually stored in one or more caches accessible to the processor core to reduce memory latency. This set of data and instructions used by the thread to execute in a certain window of time is often referred to as the thread's working set. Moving a thread to a different core may require the thread to reload its working set from memory or other caches, resulting in significant performance penalties. Tools intended for sequential applications provide no information on how concurrent threads are scheduled on processor cores and provide no meaningful insight on scheduling effects of concurrent threads. Understanding the behavior of concurrent threads in parallel applications and their interactions with the processing resources of a computing device is a challenge with the current developer tools.