1. Field of the Invention
The present invention relates to techniques for generating a trace stream for a data processing apparatus containing trace elements indicative of the activities of certain logic of the data processing apparatus.
2. Description of the Prior Art
Tracing the activity of a data processing system whereby a stream of trace elements is generated including data representing the step-by-step activity within the system is a highly useful tool in system development. However, with the general move towards more deeply embedded processor cores, it becomes more difficult to track the activities of the processor core or other on-chip devices via externally accessible pins. Accordingly, as well as off-chip tracing mechanisms for capturing and analyzing trace data, increased amounts of tracing functionality are being placed on-chip. An example of such on-chip tracing mechanisms is the Embedded Trace Macrocell (ETM) provided by ARM Limited, Cambridge, England, in association with a variety of their ARM processors.
Such tracing mechanisms produce in real-time a stream of trace elements representing activities of the data processing apparatus that are desired to be traced. This trace stream can then subsequently be analyzed for a variety of purposes, for example to facilitate debugging of sequences of processing instructions being executed by the data processing apparatus, for performing profiling operations in order to determine the performance of particular program code being executed on the data processing apparatus, etc.
Typically, the stream of trace elements that is generated by the trace mechanism is buffered prior to output for subsequent analysis. Such a trace buffer is able to store a finite amount of information and requires a dedicated data bus which has a finite bandwidth over which the elements to be buffered can be received. The trace buffer is generally arranged to store information in a wrap-around manner, i.e. once the trace buffer is full, new data is typically arranged to overwrite the oldest data stored therein. It has been found that the bandwidth of the dedicated data bus limits the rate at which information can be stored in the trace buffer.
Typically, a trace analyzing tool is provided which receives the stream of trace elements from the trace buffer when desired, for example once the trace has completed. The trace analyzing tool can then be used to reconstruct the activities of the device being traced based on the received trace elements. As devices such as processor cores increase in power and complexity, it is clear that the amount of information required to track the activities of such devices will increase, and accordingly there will potentially be a very large volume of trace elements that need to be traced.
However, there is a problem that there is finite bus bandwidth over which the trace elements can be output by the trace logic, and any trace buffer used to buffer such trace elements will have a finite size. Accordingly, the volume of trace elements that can be generated is limited.
The activities of a device that might want to be traced include, but are not limited to, the instructions being executed by a processor core (referred to as instruction trace), and the memory accesses made by those instructions (referred to as data trace). These activities may be individually traced or traced together, so that the data trace can be correlated with the instruction trace. The data trace itself consists of two parts, the memory addresses and the data values, referred to (respectively) as data address and data value trace. Again, the existing trace ETM protocols allow for data address and data value tracing to be enabled independently or simultaneously.
Experience shows that for existing processor cores and ETM protocols, a bit rate of less than 2 bits per instruction is achieved for instruction tracing only. However, to illustrate the above problem, a bit rate of approximately 10 to 16 bits per instruction is achieved for instruction and data address tracing. Therefore a processor having an operating speed of approximately 1 GHz executing one instruction per cycle will generate approximately 10 to 16 Gbits/s of trace data, all of which must be taken off-chip and captured in a fixed-size buffer. In addition to tracing instructions and data addresses, certain classes of problem also require data value tracing to be performed, and this will further increase the amount of trace data that needs to be generated to over 20 bits per instruction. Collectively, the two elements of data tracing, namely the data address tracing and the data value tracing, contribute to a large proportion of the overall volume of trace elements produced.
Current ETM logic can also be arranged to provide a cycle accurate mode of operation, in which further information is included within the trace stream to indicate each clock cycle, such that when subsequently analyzing the trace elements within the trace stream, an indication of the clock cycle in which the associated activity took place within the traced device can be determined. Whilst such a cycle accurate mode of operation can be useful in many situations, it significantly increases the volume of trace data produced, irrespective of whether instruction trace, data trace, or a combination of both, is being performed. In situations where the volume of trace data being produced is already very large, the extra increase in volume resulting from performing cycle accurate trace can cause significant problems, having regard to the finite bus bandwidth over which the trace elements can be output by the trace logic, and the finite size of the trace buffer used to buffer such trace elements.
Accordingly, it would be desirable to provide an improved technique for generating a stream of trace elements, so as to enable more effective use to be made of the finite bus bandwidth over which the trace elements can be output, and the finite size of any trace buffer in which those trace elements are buffered.