1. Technical Field
The present application relates generally to instruction tracing. More specifically, the present application is directed to scaling instruction intervals from phases of the traces in order to identify collection points for representative instruction traces.
2. Description of Related Art
Modern software workloads may have dynamic instruction path lengths that consist of trillions of instructions for a single dataset. For example, the program 464.h264ref, which is a computer program that is a reference implementation of the latest state-of-the-art video compression standard, which is H.264/AVC (Advanced Video Coding), when executed to completion on a processor that is using a single dataset, may have more than 3.2 trillion dynamic instructions.
Trace-driven performance simulators are used to assess design changes and project workload performance for future processors. These simulators may execute on the order of ten thousand instructions per second on modern machines. Thus, for a program with one trillion dynamic instructions, simulation could take on the order of 3.1 years to complete. However, prior work has shown that the dynamic instructions in a workload often exhibit phases of execution, i.e. repetitive sequences of instructions that correlate strongly to the basic blocks being executed by a program. By creating a representative trace from only the prominent program phases, the number of instructions that must be simulated is significantly reduced.
Some known systems use coarse-grained phases or fine-grained instruction blocks obtained using statistical analysis techniques to find a small number of instructions that represent, in proportion, the machine execution characteristics of a much larger number of instructions from the dynamic execution of a program. The output of these known systems is a set of begin and end instruction index pairs that indicate the portions, such as phases or instruction blocks, of the program execution or program trace that best represent the execution of the program. In many cases the phases are all the same length and overall performance is obtained by multiplying the performance results for each phase by the frequency of appearance of the phase in the full program execution. Additionally, the phases may not be obtained from all input datasets for simultaneous execution on a simulator.
However, these known systems fail to address the use of a specific number of instructions to represent a program, the inclusion of phases from each input dataset, inclusion of all datasets at once, and all of these such that all program phases for all datasets are accurately represented in a trace, as may be important for efficient and accurate trace-driven program execution in a simulator system.