The present invention addresses the need to acquire a real-time trace of program execution from a highly integrated microprocessor. Typically, users wish to obtain a "trace" or listing, of exactly what instructions execute during each clock cycle for a limited period of time during the execution of a program in order to debug or analyze the performance of the program. A "real-time" trace is one that can be acquired while the program runs at normal speed, in the actual system environment, and can be triggered by some system event recognized by the trace acquisition system. Note that since any buffer used to acquire a trace will have a finite number of entries that will likely be much smaller than the number of clocks consumed in the execution of the program, the trace acquisition system must be able to selectively retain only the information for the clock cycles of interest, i.e., those just before and just after the "trigger" event ("TE"). Further, the system must provide a means for synchronizing the TE with the contents of the trace buffer so that the user can tell exactly what instructions were executing during the clock cycle that the TE occurred. A "non-invasive" trace is one that can be acquired without disturbing the timing behavior of the program from its behavior while not being traced.
A difficulty in acquiring a trace from a highly integrated processor stems from the invisibility of most of the signals required to derive the trace. A typical approach to deriving an instruction trace requires one to determine the location of an instruction being executed on a particular clock cycle (ie., at the start of the trace), and then to determine for subsequent clock cycles how many instructions are executed, whether they are taken or not if they are branches, and the target addresses for the taken branches.
Because the processor has an integrated instruction cache, the instruction address bus is not accessible externally and hence, each instruction fetch cannot normally be seen. Also, the signals that indicate the number of instructions executed each cycle and the direction taken by conditional branches are not usually available externally to the integrated circuit ("IC"). Therefore, some information must normally be exported from the microprocessor in order to acquire the trace. This information should appear on the external pins of the IC; either on pins that are already used for other purposes such as external data and address buses, or on pins dedicated to the tracing function.
Multiplexing trace data onto existing pins has two potential problems. If the trace runs all the time, it will contend for system resources (e.g., bus bandwidth), degrading performance to support a feature that is only used during software debug operations. If the trace data is switched on only when acquiring a trace, it may affect the timing of the program by delaying the processor's normal access to the shared pins, and thus will be intrusive. Dedicated pins can alleviate this problem; however, to maintain low cost of the IC, the pin count must be kept as low as possible.
A previous invention, disclosed within the cross-referenced patent application, described a set of hardware additions made to a microprocessor to provide a non-intrusive, real-time trace capability with low additional cost to the processor IC. However, that solution had the following deficiencies:
(1) It could only trace forward from a TE. That is, once the TE was recognized, trace information was provided to reconstruct an instruction trace from the clock on which the TE occurred and some finite number of clock cycles (dictated by the depth of the external trace acquisition buffer) after the TE. When debugging, a software engineer may often wish to trigger the capture of the trace when some extraordinary error or event happens, and then to see a trace of the instructions that preceded the unexpected event, to determine what caused the event. For example, one might wish to acquire a trace whenever the processor vectors to an error exception handling routine. In order to determine the cause of the error, one must use the trace of instructions before the error was recognized. The instructions executed after the error occurs are just those of the exception handling routine, and tracing them will be of little use in determining the cause of the error. PA1 (2) It can only indicate a single TE on the output pins. The ability to indicate multiple TEs is useful if the user wants to count TEs and retain the trace information for the time period around the Nth TE. PA1 (3) The partitioning of the solution did not lend itself to reducing cost in a "CORE+ASIC" environment. In this type of design environment, a central processing unit ("CPU") is provided as a large "macro" or "mega-cell" to be used as an element of an Application Specific Integrated Circuit ("ASIC"). The CPU is a "hard macro"; that is, it is a physical design implementation that is placed onto the ASIC as a whole and is not subject to any type of changes or physical optimizations. Since some ASICs may need support for tracing and some may not, it is desirable to add as little hardware to the CPU as possible and allow for another macro block or some part of the ASIC logic to implement the bulk of the additional logic necessary to support trace operations. In this manner, one could easily remove the logic used to support tracing when it is not required on a particular ASIC. The previous solution described within the cross-referenced patent application used three registers in the CPU dedicated to the tracing function; removing them from the CPU is desirable. PA1 (4) The processor operation had to be stopped in order to read the dedicated registers. Stopping the processor operation may be inconvenient or impossible. For example, if it was desired to acquire several trace fragments over the time that the processor runs a relatively long task, the processor could not be stopped to retrieve the information from the dedicated registers without affecting the application that was being traced. PA1 (1) The present invention allows for trace acquisition both before as well as after a triggering event ("TE") is recognized by the system. PA1 (2) Multiple TEs can be indicated by the CPU and counted by the external trace gathering system. Former trace acquisition systems started broadcasting trace information when the first TE occurred, and only that one TE was indicated. Multiple TEs are useful, for example, if a user wishes to trace the Nth time through a certain section of code. PA1 (3) Some dedicated hardware is removed from the CPU and replaced with hardware that can be easily partitioned from the CPU, thus making the solution less costly for CORE+ASIC products that do not require the tracing capability. PA1 (4) Stopping of the processor to read the dedicated registers is not required. The trace pins can be examined and the information on these pins retrieved "on-the-fly". As a result, it is possible to acquire several trace fragments over the time that the processor runs a relatively long task, and the processor operation is not stopped, which alleviates the problem of affecting the application that is being traced.
Thus, there is a need in the art for an improved tracing operation for an integrated processor that addresses the above four issues.