1. Field of the Invention
The field of the invention relates to data processing and in particular to diagnostic mechanisms for monitoring data processing operations.
2. Description of the Prior Art
There are a number of situations where it is desirable to keep track of the processing being performed by a processing circuit, and in such situations it may be desirable to be able to identify an order that instructions are processed in and to determine at any point in time which instruction is being processed. For example, such information is useful during the development of data processing systems, where it is often desirable to track the activity of the processing circuit. An example of a tool that may be used to assist in such a process is a tracing tool.
Tracing the activity of a data processing system whereby a trace stream is generated including data representing the step-by-step activity within the system is a highly useful tool in system development. Such tracing tools use a variety of means for tracing the program flow including embedded trace macrocells (ETM, a trademark of ARM Limited, Cambridge) which are present on the chip whose processing is being monitored.
Most processor instruction set architectures include branch instructions that are conditional on the state of the data processing system at the point where the branch is processed, that is they will execute and branch to the destination if some condition is true, and not execute and continue to the next sequential instruction if the condition is false, that is, it is treated as a no-op operation. Most instruction set architectures also include an indirect branch instruction, where the destination of the branch is calculated from the current state of the data processing system at the point where the branch is processed. The ARM® instruction set architecture, which is documented in the ARM Architecture Reference Manual, ISBN 0-201-73719-1 of 2001 also includes conditional instructions that are not branch instructions, which either execute or not depending on the current state of the processor at the point when the instruction is processed. Other instruction set architectures also include such conditional, sometimes referred to as predicated, instructions. Instructions that are not conditional—those that always execute—are referred to as unconditional instructions.
Current protocols used on Embedded Trace Macrocells for non-cycle-accurate trace of existing ARM® (registered trade mark of ARM Limited, Cambridge) processor cores has evolved from that used for cycle-accurate trace. Thus, for every instruction in a stream the ETM codes the information from the CPU as either an E-atom (when the instruction is executed) or an N-atom (when the instruction was not executed). The ETM then emits a data stream with the sequence of E and N-atoms that occurred. Generally these are emitted in a compressed form, using encoding techniques such as run-length encoding.
This is described in the ARM Embedded Trace Macrocell Architecture Specification, ARM IHI 00141 of December 2002.
This data stream can be stored either on- or off-chip and can then be fed to a debug agent program called an ETM decompressor. The decompressor has a copy of the program being traced so by decoding the E and N-atoms, and other information in the data stream which encodes data dependent changes to program flow (such as indirect branches) it can reconstruct the program flow in the embedded CPU.
As data processor cores increase their operating frequency and processors having multiple cores are becoming more common there is a need to improve the debug and tracing tools and mechanisms that may be used within the development of data processing systems. Increasing core frequencies pose a particular problem for trace. For example in existing ARM processor cores and ETM protocols, a bit rate of about 1.2 to 1.6 bits per instruction is achieved with instruction-only trace. Therefore a 1 GHz processor processing one instruction per cycle would generate 1.2 to 1.6 gigabits per second of trace data, this data may need to be taken off-chip and stored in a buffer. Furthermore, multi-processor system introduce integer multiples to this data rate.
Compression is used to reduce the average number of bits used to trace an individual instruction. However, as ever faster cores need to be traced, it would be advantageous to be able to reduce this data rate further.
Another known way of tracing the data activity of a data processing system is that embodied in the data processing system produced by Intel® under the name of XScale®. This is described in the Intel Developers Manual of January 2004 entitled Intel® XScale® core.
In this trace mechanism, instead of outputting details of every instruction that executes or does not execute XScale® counts the instructions that are processed until it gets to a branch instruction that is executed. Thus, it outputs a number of instructions processed and information as to where the program has branched to. Thus, if it passes a conditional branch instruction that does not execute, no information on this instruction is output as it never outputs non-execution indicators, unlike the other conventional trace mechanism described above. One disadvantage with XScale® is that by not outputting information on conditional branch instructions that do not execute, the indicator of a conditional branch instruction that does execute must contain sufficient information for the trace decompressor to determine which of the possible branch instructions in the instruction stream is the one that executed, which takes the form of a counter requiring many bits to encode. In addition, because this counter counts all instructions processed that are not executed branch instructions, overflows of this counter are likely, and the trace stream therefore also has to encode overflow markers. A second disadvantage is that it only outputs information for branch instructions, and not other conditional instructions; thus, the trace is incomplete, limiting the number of situations where it is useful.