A processor may include a set of microcode routines that lie dormant until activated by a software write to a control register (e.g., Write to Model Specific Register (WRMSR) instruction). The set of microcode routines is referred to herein as “tracer,” which may be used as a tool to debug and performance tune the processor. Once activated, various events can trigger the tracer to gather processor state information and write it to specified addresses in memory. One way to use tracer is to invoke it on regular intervals. For example, every time the processor has executed and retired N instructions (e.g., 100,000 instructions—the number is specified by the user), tracer dumps the processor state to memory. The dumped processor state is referred to herein as a checkpoint. An engineer debugging the processor may then take the processor state from the checkpoints and replay them into a simulator.
The simulator receives the processor state from the checkpoint as part of its input. The input is the state of the registers (and optionally the cache memories of the processor) and the state of memory, which includes the programs executed by the processor. The simulator is a functional model of a “golden” processor. That is, the simulator starts with the initial input state of the processor and executes and retires the instructions of the programs in memory to produce the correct output state that a processor that conforms to the target processor architecture (e.g., x86 architecture) would produce. This output state can then be compared to the output state generated by the actual processor, which may be helpful in debugging design errors. The process is broadly described here:
1. Processor executes/retires N instructions and tracer dumps state checkpoint to memory.
2. Tracer restarts the processor executing where it left off. (In one implementation, tracer resets the processor and the reset microcode re-loads the processor state from the state checkpoint just dumped to memory.)
3. Steps 1 and 2 continue until the user detects that the bug has occurred, stops the cycle, and saves the state checkpoints to a file.
4. Feed the first state checkpoint from the file to the simulator.
5. The simulator executes/retires N instructions.
6. Compare the current simulated processor state with the next state checkpoint, and if they mismatch, the logic designer uses the information to debug the processor.
7. Otherwise, feed the next state checkpoint from the file to the simulator and then repeat steps 5 and 6.
In addition to the memory footprint and register state, the input to the simulator includes information about the occurrence of events generated by agents outside the processor. For example, interrupt requests are sent to the processor. Additionally, other agents in the system read and write to memory shared by the processor with the other agents. The other agents may be I/O devices or other processors. These events occur on the architectural processor bus shared by the various agents and can therefore be captured by a logic analyzer connected to the bus and correlated in time relative to the dumping of the state checkpoints to memory on the bus.
In the case of a dual-core processor, actions by one core may affect the function of the other core. For example, memory accesses by one core may affect operation of the other core. In particular, some bugs occur only during interaction between the two cores.
A problem has been detected in the process of debugging a dual-core processor using a simulator. Specifically, each core in the actual processor part independently performs the tracer stops, dumps, and restarts described above in steps 1 and 2. Consequently, the state checkpoints generated by the two cores in operation of the actual part do not necessarily correlate in time with one another. Additionally, some core interaction-related bugs were not able to be reproduced likely due to the fact that the tracer stops and restarts were not coordinated.