As technology advances, computer systems include greater numbers of processors in the form of multiprocessor systems, e.g., via one or more multi-core processors, that can execute multiple threads concurrently. The ever increasing number of cores and logical processors in a system enables more software threads to be executed. While this trend benefits users in the form of increased processing power and computation ability, difficulties can arise. For example, the increase in the number of software threads that may be executed simultaneously can create problems with synchronizing data shared among the software threads. One common solution to accessing shared data in multiple core or multiple logical processor systems uses locks to guarantee mutual exclusion across simultaneous accesses to shared data. Such locking mechanisms can be detrimental to system performance however and may cause program failures, e.g., due to lock contention or other unwanted behavior. Other adverse effects of multiple threads exist, and in addition imprecise software can lead to performance impacts or errors in execution.
Accordingly, software performance investigations can occur to determine a cause of a problem or to improve software performance. Some analysis and debugging can be aided by a performance monitoring unit of a processor. However, such analysis often requires that a developer understands how the software arrived at a software performance bottleneck or a point of interest. For example, it is usually not sufficient to provide data that a given function is causing eviction of large amounts of the contents of a cache memory, known as cache trashing. Investigating a software bottleneck often requires a call stack to the function that resulted in a large number of cache line replacements. The most typical solution to this problem is for a software performance analysis tool to output the most frequent call stacks to a function of interest utilizing instrumentation or other intrusive methodologies. But such methodologies suffer from various drawbacks, including complexity, intrusiveness, and obtaining more information than needed for debug or other purposes.