Tracing the activity of data processing systems whereby a stream of trace elements (also called “trace data”) is generated, including data representing the step-by-step activity within the system, is a highly useful tool in system development. However, with the general move towards more deeply embedded processor cores, it becomes more difficult to track the activities of the processor core or other on-chip devices via externally accessible pins. Accordingly, as well as off-chip tracing mechanisms for capturing and analysing trace data, increased amounts of tracing functionality are being placed on-chip. An example of such on-chip tracing mechanisms is the Embedded Trace Macrocell (ETM) provided by ARM Limited, Cambridge, England, in association with a variety of their ARM processors.
Such tracing mechanisms produce in real-time a stream of trace elements representing activities of the data processing apparatus that are desired to be traced. This trace stream can then subsequently be analysed for a variety of purposes, for example to facilitate debugging of sequences of processing instructions being executed by the data processing apparatus, for performing profiling operations in order to determine the performance of particular program code being executed on the data processing apparatus, and so on.
Typically, the stream of trace elements that is generated by the trace mechanism is buffered prior to output for subsequent analysis. Such a trace buffer is able to store a finite amount of information and requires a dedicated data bus which has a finite bandwidth over which the elements to be buffered can be received. The trace buffer is generally arranged to store information in a wrap-around manner, i.e. once the trace buffer is full, new data is typically arranged to overwrite the oldest data stored therein. It has been found that the bandwidth of the dedicated data bus limits the rate at which information can be stored in the trace buffer.
Typically, a trace analysing tool is provided which receives the stream of trace elements from the trace buffer when desired, for example once the trace has completed. The trace analysing tool can then be used to reconstruct the activities of the device being traced based on the received trace elements. As devices such as processor cores increase in power and complexity, it is clear that the amount of information required to trace the activities of such devices will increase, and accordingly there will potentially be a very large volume of trace elements that need to be traced.
However, there is a problem that there is finite bus bandwidth over which the trace elements can be output by trace logic, and any trace buffer used to buffer such trace elements will have a finite size. Accordingly, the volume of trace elements that can be generated is limited. The activities of a device that might want to be traced include, but are not limited to, the instructions being executed by a processor core (referred to as instruction trace), and the memory accesses made by those instructions (referred to as data trace). These activities may be individually traced or traced together, so that the data trace can be correlated with the instruction trace. The data trace itself consists of two parts, the memory addresses and the data values, referred to (respectively) as data address and data value trace. Again, the existing trace ETM protocols allow for data address and data value tracing to be enabled independently or simultaneously.
Current ETM logic can also be arranged to provide a cycle accurate mode of operation, in which further information is included within the trace stream to indicate each clock cycle, whereby a clock cycle indication is produced for each trace element generated. When subsequently analysing the trace elements within the trace stream, an indication of the clock cycle in which the associated activity took place within the trace device can be determined. Whilst such a cycle accurate mode of operation can be useful in many situations it significantly increases the volume of trace data produced. In situations where the volume of trace data being produced is already very large, the extra increase in volume resulting from performing cycle accurate trace can cause significant problems having regard to the finite bus bandwidth over which the trace elements can be output by the trace logic, and the finite size of the trace buffer used to buffer such trace elements.
Commonly assigned U.S. Pat. No. 7,069,176, incorporated herein by reference, discloses a data processing apparatus generating a trace data stream into which both global timestamps (i.e. with reference to an external clock source) and local timestamps (i.e. with reference to an internal clock) may be added.
It would be desirable to provide an improved technique for generating a stream of trace elements, so as to enable more effective use to be made of the finite bus bandwidth over which the trace elements can be output, and the finite size of any trace buffer in which those trace elements are buffered.