1. Field of the Invention
The present invention relates to techniques for generating a trace stream indicative of activities of monitored circuitry of a data processing apparatus.
2. Background of the Invention
Tracing the activity of a data processing system whereby a stream of trace elements is generated including data representing the step-by-step activity within the system is a highly useful tool in system development. However, with the general move towards more deeply embedded processor cores, it becomes more difficult to track the activities of the processor core or other on-chip devices via externally accessible pins. Accordingly, as well as off-chip tracing mechanisms for capturing and analyzing trace data, increased amounts of tracing functionality are being placed on-chip. An example of such on-chip tracing mechanisms is the Embedded Trace Macrocell (ETM) provided by ARM Limited, Cambridge, England, in association with a variety of their ARM processors.
Such tracing mechanisms produce in real-time a trace stream providing trace elements representing activities of the data processing apparatus that are desired to be traced. This trace stream can then subsequently be analyzed for a variety of purposes, for example to facilitate debugging of sequences of processing instructions being executed by the data processing apparatus, for performing profiling operations in order to determine the performance of particular program code being executed on the data processing apparatus, etc.
Typically, the stream of trace elements that is generated by the trace mechanism is buffered prior to output for subsequent analysis. Such a trace buffer is able to store a finite amount of information and requires a dedicated data bus which has a finite bandwidth over which the elements to be buffered can be received. The trace buffer is generally arranged to store information in a wrap-around manner, i.e. once the trace buffer is full, new data is typically arranged to overwrite the oldest data stored therein. It has been found that the bandwidth of the dedicated data bus limits the rate at which information can be stored in the trace buffer.
Typically, a trace analyzing tool is provided which receives the trace stream of trace elements from the trace buffer when desired, for example once the trace has completed. The trace analyzing tool can then be used to reconstruct the activities of the device being traced based on the received trace elements. As devices such as processor cores increase in power and complexity, it is clear that the amount of information required to track the activities of such devices will increase, and accordingly there will potentially be a very large volume of trace elements that need to be traced.
However, there is a problem that there is finite bus bandwidth over which the trace elements can be output by the trace logic, and any trace buffer used to buffer such trace elements will have a finite size. Accordingly, the volume of trace elements that can be generated is limited.
The activities of a device that it might be desirable to trace include, but are not limited to, the instructions being executed by a processor core (referred to as instruction trace), and the memory accesses made by those instructions (referred to as data trace). These activities may be individually traced or traced together, so that the data trace can be correlated with the instruction trace. The data trace itself consists of two parts, the memory addresses and the data values, referred to (respectively) as data address and data value trace. Again, the existing trace ETM protocols allow for data address and data value tracing to be enabled independently or simultaneously.
To reduce the volume of trace data that needs to be output in the trace stream, it is known to subject the trace elements to compression techniques prior to output in the trace stream. In particular, sequences of trace elements can be subjected to an encoding operation in order to produce a packet whose bit pattern represents that sequence of trace elements, with the packet then being output in the trace stream. A compression scheme will typically be defined providing a number of different encoding formats that can be used to encode the trace elements. As an example, each packet output will typically consist of a header portion and an optional payload portion. Considering the headers, these typically take the form of a byte of information, and accordingly there are 256 possible encodings of header. Some of those encodings may be associated with one encoding format, whilst others of those encodings may be associated with different encoding formats. Further encoding formats may be provided for the payload portions. These different encoding formats may be used to encode the different types of trace element that need to be encoded into packets, but these different encoding formats will be non-overlapping in the bit pattern encoding space so as to enable each packet to be uniquely identified by the trace analysing tool used to analyse the trace stream.
By way of example, the existing ETM protocols produce trace elements called atoms which indicate whether instructions have been executed or not, an E atom indicating that an instruction has been executed and an N atom indicating that an instruction has not been executed. Sequences of these atoms can be compressed into packets known as p-headers, a p-header consisting of a byte size header and no payload. One or more encoding formats may be provided for producing p-headers for particular sequences of E and N atoms but each such encoding format will occupy a non-overlapping bit pattern encoding space with respect to any other encoding formats used for p-headers, and indeed in respect of any other encoding formats used for different types of headers, for example branch headers used to identify packets giving information about branch instructions, data address or data value headers used to identify packets containing data address or data value information, etc.
The compression scheme used in association with any particular trace circuitry will typically be fixed at design time, with the particular encoding formats provided in that compression scheme having been chosen based on expected patterns of trace elements which will need to be compressed. Hence, by way of example, considering the earlier-mentioned p-headers, the encoding formats used to produce those p-headers from the sequences of E and N atoms typically use run length encoding schemes where the encoding skews towards long sequences of E atoms since these are typically more common than N atoms.
In addition to being able to selectively enable instruction trace, data address trace, and data value trace, current trace circuitry can also support different trace modes of operation. For example, considering the earlier-mentioned ETM product, a non-cycle accurate trace mode of operation and a cycle accurate trace mode of operation can be provided. To switch from the non-cycle accurate trace mode of operation to the cycle accurate trace mode of operation, or vice versa, any current tracing activity is halted, and a number of the control registers in the trace circuitry are updated to define the new trace mode of operation. As part of that update process, the compression scheme may be changed, to provide a compression scheme better optimised for the new trace mode of operation. For example, considering a cycle accurate trace mode of operation, additional atoms known as W atoms are produced on each clock cycle to provide the cycle accurate timing information, and accordingly the encoding formats used for p-headers in a cycle accurate trace mode of operation are typically different to the encoding formats that would be used in a non-cycle accurate trace mode of operation, to reflect the need to encode sequences of W atoms in the p-headers in addition to E and N atoms. However, again, any compression scheme developed for the cycle accurate trace mode of operation will typically be fixed at design time taking into account expected patterns of trace elements that need to be compressed.
However, within any particular trace mode of operation, efficiency problems can arise when applying the predetermined compression scheme provided for that trace mode of operation. For example, as the complexity and capabilities of the monitored circuitry whose activities are being traced increases, there can be significant variations in the code sequences being executed. Considering again the subject of p-headers, the ARM Thumb-2 instruction set provides an “if then” instruction, which it is found may significantly vary the frequency of N atoms produced, and adversely impact the efficiency of any encoding formats provided for p-headers. As another example, the trace circuitry can be adapted so as not merely to trace all instructions, but instead to only trace particular types of instructions such as branch instructions. When only tracing branch instructions a much higher distribution of N atoms can occur, again adversely affecting compression efficiency.
As mentioned earlier, the volume of trace produced is a significant problem, and any inefficiency in the encodings provided by the compression scheme will increase the volume of the trace stream, and hence require more area for on-chip buffering, more hardware for off-chip capture and/or more pins for off-chip real-time trace capture. Accordingly, it would be desirable to provide an improved technique for compressing trace elements prior to output of a trace stream.