Computer processor designers rely heavily on benchmark simulations to evaluate various design alternatives. To this end, significant emphasis is placed on accurately modeling the design choices in software simulators. Despite the fact that processing power has increased, accurate modeling of a complex design may dramatically reduce simulation speed, thereby restricting the ability to study tradeoffs between design alternatives. To address this issue, researchers sometimes simulate only a small fraction of the overall program execution, in the hope that the simulated fraction is a good representation of the overall program behavior. However, recent studies have shown that programs exhibit different behaviors during different execution phases that occur over a long time period. Consequently, there is tension between the need to reduce the time required for accurate simulations and the need to simulate program execution over a long period of time to accurately capture the phase behavior.
The behavior of a program is not random. As programs execute, they exhibit cyclic behavior patterns. Recent research has shown that it is possible to accurately identify and predict phases in program execution. An understanding of the phase behavior of a program can be exploited for accurate architecture simulation, to compress program traces, to conserve power by dynamically reconfiguring caches and processor width to guide compiler optimizations, and/or to provide feedback to the programmer to guide program optimization.
Prior work on phase classification divides a program's execution into intervals. An interval is a contiguous portion of execution (e.g., a slice in time) of a program. Intervals that exhibit similar behavior (e.g., a similar number of instructions per cycle (IPC), similar cache miss rates, similar branch miss rates, etc) are classified as members of a phase. The intervals that belong to a given phase need not be located together (e.g., adjacent in time). Instead, intervals that belong to a given phase may appear throughout the program's execution. Some prior work uses an off-line clustering algorithm to break a program's execution into phases to perform fast and accurate architecture simulation by simulating a single representative portion of each phase of execution. One example method for performing this type of analysis is the Automated Phase Analysis and Recognition Tool (iPART) from Intel (B. Davies et al., Ipart: An Automated Phase Analysis and Recognition Tool, tech. report, Microprocessor Research Labs, Intel Corp., November 2003).
A software program can contain multiple threads that can execute different instructions from a software program simultaneously or almost simultaneously. For example, multiple threads may allow multiple users to execute a software program simultaneously on a single computer. Multi-threaded software programs may be quite complex and may be more difficult to analyze than single threaded software programs. For example, if multiple threads attempt to access a hardware resource simultaneously, one thread may be delayed until the other thread finishes accessing the resource. Further, simultaneous execution of multiple threads can change program phases or result in new phases that would not occur if only one thread were executing. Moreover, if threads are spawned at different times from execution to execution, the phases defined based on system resources may be different from execution to execution. The complexities of analyzing multi-threaded software programs executing on one or more computer processors has created a desire for a tool to analyze such multi-threaded software programs.