1. Field of the Invention
This invention relates to processor simulation and modeling techniques and, more particularly, to generation of instruction traces for processor simulation.
2. Description of the Related Art
As processor designs become increasingly complex, various types of pre-manufacturing simulation and modeling play more significant roles in design success. For example, the overall performance of a given processor typically depends on its microarchitectural configuration, but different design solutions may have significantly different effects on processor performance and design complexity. Thus, performance simulation to assess the relative merits of various microarchitectural configurations prior to investment in substantial design effort may result in improved overall performance of the resulting processor, depending on the quality and comprehensiveness of such performance simulation relative to the workloads actually presented to the resulting processor.
Once high-level features of a given processor implementation have been chosen and the implementation process begins, further simulation may be used to ensure that the processor satisfies functional and performance expectations. For example, verification tests may be performed on representations or models of the processor to ascertain whether the model is functionally correct (e.g., produces results in accordance with the defined behavior of the processor's architecture). Similarly, performance tests may be performed on processor models to determine whether the model produces correct results at a level predicted by earlier microarchitectural performance simulation.
Processors are becoming increasingly capable of parallel execution of different threads of instructions, for example via multithreaded and/or multicore designs. As a result, accurate simulation of such processors depends on workloads that fully exercise the processing resources of the design. For example, a processor under design may be configured to support four independent threads of concurrent processing activity.
However, if such a processor were simulated using only two threads of processing activity as a test workload, important effects of the design under a full workload may be overlooked (e.g., whether a shared cache can support four concurrently-executing threads without starving or stalling one or more threads).
Generating workloads that reflect large degrees of thread-level parallelism for use in simulation may present numerous challenges, however. In some embodiments, the execution behavior of a real system may be captured, appropriately modified and used as a workload or stimulus for model-based simulation. However, in some instances, no real system may exist that utilizes a degree of parallelism of a processor under development.
In other embodiments, an appropriate workload may be generated through another simulation process. However, the performance of simulation is typically far lower than the performance of actual system hardware, and this difference is exacerbated by the level of parallelism for which the simulation is configured. Thus, while a simulator may be configured to generate workloads having an arbitrary degree of thread-level parallelism, the amount of simulation time required to generate such workloads may severely limit the utility of this approach. Further, simply duplicating threads of existing traces to increase the overall parallelism of the trace may introduce artifacts that significantly distort the execution behavior of the trace.