1. Field of the Invention
The present invention relates to the generation of trace elements within a data processing apparatus having one or more devices whose behaviour is to be traced.
2. Description of the Prior Art
Tracing the activity of a data processing system whereby a stream of trace elements is generated including data representing the step-by-step activity within the system is a highly useful tool in system development. However, with the general move towards more deeply embedded processor cores, it becomes more difficult to track the activities of the processor core or other on-chip devices via externally accessible pins. Accordingly, as well as off-chip tracing mechanisms for capturing and analysing trace data, increased amounts of tracing functionality are being placed on-chip. An example of such on-chip tracing mechanisms is the Embedded Trace Macrocell (ETM) provided by ARM Limited, Cambridge, England, in association with various of their ARM processors.
Such tracing mechanisms produce in real-time a stream of trace elements representing activities of the data processing apparatus that are desired to be traced. This trace stream can then subsequently be analysed for a variety of purposes, for example to facilitate debugging of sequences of processing instructions being executed by the data processing apparatus, for performing profiling operations in order to determine the performance of particular program code being executed on the data processing apparatus, etc.
Typically, the stream of trace elements that is generated by the trace mechanism is buffered prior to output for subsequent analysis. Such a trace buffer is able to store a finite amount of information and requires a dedicated data bus which has a finite bandwidth over which the elements to be buffered can be received. The trace buffer is generally arranged to store information in a wrap-around manner, i.e. once the trace buffer is full, new data is typically arranged to overwrite the oldest data stored therein. It has been found that the bandwidth of the dedicated data bus limits the rate at which information can be stored in the trace buffer.
Typically, a trace analysing tool is provided which receives the stream of trace elements from the trace buffer when desired, for example once the trace has completed. The trace analysing tool can then be used to reconstruct the activities of the device being traced based on the received trace elements. As devices such as processor cores increase in power and complexity, it is clear that the amount of information required to track the activities of such devices will increase, and accordingly there will potentially be a very large volume of trace elements that need to be traced.
However, there is a problem that there is finite bus bandwidth over which the trace elements can be output by the trace logic, and any trace buffer used to buffer such trace elements will have a finite size. Accordingly, the volume of trace elements that can be generated is limited. The bandwidth issue is of particular concern to off-chip trace buffers, although it can also be a concern for on-chip trace buffers. The trace buffer size issue is particularly a concern to on-chip trace buffers, where size is at a premium.
The activities of a processor core that might want to be traced include, but are not limited to, the instructions being executed by that processor core (referred to as instruction trace), and the memory accesses made by those instructions (referred to as data trace). These activities may be individually traced or traced together, so that the data trace can be correlated with the instruction trace. The data trace itself consists of two parts, the memory addresses and the data values, referred to (respectively) as data address trace and data value trace. Again, the existing trace ETM protocols allow for data address tracing and data value tracing to be enabled independently or simultaneously.
Experience shows that for existing processor cores and ETM protocols, a bit rate of less than 2 bits per instruction is achieved for instruction tracing only. However, to illustrate the above problem, a bit rate of approximately 10 to 16 bits per instruction is achieved for instruction and data address tracing. Therefore a processor having an operating speed of approximately 1 GHz executing one instruction per cycle will generate approximately 10 to 16 Gbits/s of trace data, all of which will typically need to be captured in a fixed-size buffer, which may be off-chip. In addition to tracing instructions and data addresses, certain classes of problem also require data value tracing to be performed, and this will further increase the amount of trace data that needs to be generated to over 20 bits per instruction. Collectively, the two elements of data tracing, namely the data address tracing and the data value tracing, contribute to a large proportion of the overall volume of trace elements produced.
ARM Limited's U.S. patent application Ser. No. 10/452,904, now U.S. Pat. No. 7,197,671, describes a technique where a trace generation unit maintains a table used to identify architectural state derivable from previously generated trace elements, with the trace generation unit then referencing that table in order to determine which trace elements to generate during the trace generation. This can enable the number of trace elements required to be generated to be reduced, since the table provides a record of the architectural state which has already been provided to the recipient of the trace stream. Whilst such an approach provides some benefits in reducing the volume of trace elements produced, it requires the maintenance of a table within the trace logic, and can only start to reduce the amount of trace once the table has been populated to provide a history of architectural state that has already been provided by previous trace elements of the trace stream.
Accordingly, it would be desirable to provide an alternative technique for generating a stream of trace elements, which can be readily implemented whilst enabling effective use to be made of the finite bus bandwidth over which the trace elements can be output, and the finite size of any trace buffer in which those trace elements are buffered.