Traces of memory access patterns provide a window into program execution allowing the simulation of memory systems with the goal of evaluating different cache designs. The analysis of cache designs is becoming even more crucial as caches become dramatically faster than main memory, and cache misses are an evermore important factor in system performance.
Many of the new processor designs use RISC technology with very fast on-chip caches and somewhat slower off-chip secondary caches. For example, a typical ratio of hit-to-miss costs on today's RISC machines might be 1 to 10, whereas machines currently being designed might have a ratio of 1 to 10 for a hit on the secondary cache or 1 to 200 if the miss must go to main memory. In order to evaluate the appropriate sizes and characteristics of memory systems for the new RISC machines, their behavior must be simulated. While traces are available for a variety of CISC machines, neither traces nor mechanisms for producing them are available for RISC machines.
There is reason to believe that RISC traces are sufficiently different from CISC traces to warrant the generation of fresh traces. RISC code for a program is often twice as large as corresponding CISC code, increasing the number and range of instruction references. Also, effective use of the large register sets built into RISC reduces the number of data references compared to code for a CISC machine. So, at the very least, the balance of instruction and data references will change markedly.
Unfortunately, existing methods for generating traces are inappropriate for use on RISC machines. The most common software method involves the simulation of a program's execution to record all of its instruction and data references. This method is both slow and limited. Simulation is slow for CISC programs, and slower for RISC code, because it contains many more instructions each requiring a pass through the main simulator loop. A 1000.times. or more slow down makes traces of real time behavior, including kernel and multiprogrammed execution, impossible to accurately simulate. Hardware methods spy on address lines to trace execution in real time, but usually have limited capacity and are not sufficiently selective. Currently, the most accurate method involves microcode modification (Agarwal et al., "ATUM: A New Technique for Capturing Address Traces Using Microcode", Proceedings of the 13th Annual Svmposium on Computer Architecture (IEEE, New York, June 1986) pp. 119-127). The microcode for a machine is modified to trap address references and generate trace data by watching the address bus and logging those that it sees. A modified machine runs 20 times slower than an untraced machine. The method is not applicable to RISC machines that generate low level instructions to be directly executed, as there is no microcode to be modified.
An additional problem with existing methods is that they all involve the generation and storage of entire traces for later analysis. The requirement that traces be stored limits the length of the trace. The simulation of very large caches, such as those proposed for second level caches in a number of machines, require long traces if the caches are to reach a stable state during the simulation.
Thus, the problems to be addressed in developing the method for acquiring very long traces to be disclosed in detail below include the following. The traces must be complete. They must represent kernel and multiple users as they execute on a real machine. The memory references must be interleaved as they are during execution rather than being artificially interleaved separate traces.
The traces must be accurate. The trace generation must be fast enough not to perturb the accuracy of the traces. That is, the mechanism used must not slow down that execution to the extent that the behavior of the system is no longer realistic.
The tracing must be flexible. The method should include the possibility of picking and choosing the processes to be traced, optional trace kernel execution, and turning tracing on and off at any time.
The traces must be sufficiently long to make possible the realistic simulation of multimegabyte caches.