Such signals to be monitored may be passed over buses coupling individual components within the integrated circuit, or alternatively may be signals occurring within individual components assuming monitoring logic can be given access to such signals. It may be desirable to monitor the values of these signals for a variety of purposes. For example, when performing debug operations, it is often useful to monitor values of certain signals in order to seek to detect potential bugs which can then be analysed by a debug tool. Often in such debug applications, it is desirable to detect when certain predetermined values of signals occur and on such occurrence of a predetermined value to halt execution of the program and pass over control to a debug tool.
Another situation where monitoring values of signals is useful is when employing trace mechanisms to trace certain activities of the integrated circuit. In such situations, the occurrence of certain predetermined values of one or more signals can be used to trigger the generation of trace elements for outputting within a trace stream providing an indication of certain activities of the integrated circuit that may be of interest for subsequent analysis.
Another example of an application where monitoring the values of one or more signals occurring within the integrated circuit may be beneficial, is in profiling applications, where for example the profiling tool may wish to assess the number of times a particular address is accessed, the number of times a particular data value is used, etc.
In accordance with a known technique for monitoring values of particular signals, one or more watchpoint registers are provided for specifying individual values or ranges of values of interest. Such watchpoint mechanisms then compare values of particular signals occurring at a predetermined place within the integrated circuit (for example occurring over a particular bus path) with the values or ranges specified in the one or more watchpoint registers, and in the event of a match, generate a trigger signal. When used in debug applications, this trigger signal may be used, for example, to halt execution of the program and pass over control to the debug application. When used in trace or profile applications, this trigger may be used, for example, to control generation of the appropriate output trace or profile information for routing to a trace analysis tool or profile tool.
The signals being monitored may take a variety of forms, and in one embodiment may identify data addresses and/or data values passing within the integrated circuit. In such instances, the watchpoint logic may for example be coupled to a bus over which a load store unit of a processor communicates with memory. As another example, the signals being monitored may identify instruction addresses, such as may be issued by a prefetch unit of a processor, and in such instances the watchpoint logic may be coupled to a bus over which the prefetch logic issues those instruction addresses. Sometimes, watchpoint logic used to monitor instruction addresses is referred to as breakpoint logic, but herein the term “watchpoint” will be used to collectively refer to either a watchpoint or a breakpoint.
Typical implementations of watchpoint mechanisms provide a number of watchpoint registers which can be programmed with particular addresses. Further, the values in two watchpoint registers can be combined to provide a watchpoint range. However, such implementations have significant limitations. In particular, in any hardware implementation, a certain predetermined limited number of watchpoint registers will be provided, and this in turn will limit the number of separate values that can be monitored. This constraint is something which then needs to be managed carefully by the user to try and make most effective use of the available hardware resource provided by the fixed number of watchpoint registers.
An alternative approach for monitoring values of particular signals has been to employ a memory management unit (MMU) associated with a particular processing logic to generate trigger signals when particular values are identified. In particular, the MMU has access to page tables identifying particular attributes associated with pages of memory. For a page of memory associated with a value of interest, for example referenced by a particular address, then the associated entry for that page in the page table can be defined such that when the MMU sees an access to any part of that page, it will generate an abort signal, which can be used as a trigger signal in a similar way to the earlier described trigger signals produced by watchpoint logic. Whilst this approach does provide some extra flexibility by allowing more values to be monitored than may be available using standard hardware watchpoint registers, it has the problem that it produces lots of false hits. In particular, an access to any value within a particular page of memory that includes a data value of interest will cause the abort signal to be generated and further processing will then be necessary by the abort handler to establish whether the abort occurred due to access to the particular value of interest, or instead occurred due to access to a different value within that page of memory. This significantly impacts processing speed (for example in some implementations it has been shown to slow processing speed down by a factor of 100-1000).
Another major limitation of using an MMU in this way is that it can only monitor data and instruction addresses produced by the CPU: it cannot monitor data values and it cannot monitor values produced elsewhere in the integrated circuit (e.g., by a DMA engine). Another major limitation of using an MMU in this way is that it can only be used for invasive debugging, tracing and profiling since the abort signal interrupts the CPU. Non-invasive techniques are generally more preferable, since they have the benefit of minimally perturbing the behaviour of the system so that bugs are not masked and trace and profile data accurately reflects how the system would behave when not being monitored.
Nevertheless, in some implementations, despite the significant impact on processing speed, and the inherent inflexibility of such an approach, such MMU-based mechanisms have been used to overcome the inherent limitations of standard hardware watchpoint register mechanisms.
As an alternative to the above-described hardware mechanisms for monitoring values of particular signals, a number of software approaches have also been developed. One such software approach involves the use of instrumentation to generate a modified version of program code for execution, such that the software when executing provides additional information as it runs. Such instrumentation may be static instrumentation performed at compile time, or may be dynamic instrumentation where a sequence of instructions are translated into a modified stream of instructions on the fly at run time. Such instrumentation can be used to add additional instructions to the instruction sequence to seek to detect the presence of particular values of interest and instigate any required additional processing. As an example, it may be desired to detect whenever a load operation loads data from a particular data address. By such an instrumentation approach, one or more additional instructions can be added following each load instruction to identify whether the address used by that load instruction is the address of interest, and if so to branch to a particular exception routine.
One such software instrumentation approach is described in the Article “Low-Overhead Interactive Debugging via Dynamic Instrumentation with DISE” by M Corliss et al, Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA-11 2005). When describing such an instrumentation approach for watching multiple addresses, this article indicates that if the number of watched addresses is both large and sparse, the instrumentation software can set up a watched address bitmap similar to a Bloom filter in a static data region, with each store address being hashed into this bitmap. Bloom filters were named after Burton Bloom for his seminal paper entitled “Space/Time Trade-Offs in Hash Coding with Allowable Errors”, Communications of the ACM, Volume 13, Issue 4, July 1970. The purpose was to build memory efficient database applications. In the above-described software instrumentation technique, the additional instructions added by the instrumentation will reference the bitmap, with zeros in the bitmap indicating definite negatives, and ones indicating only probable positives. It is noted that this may trigger some spurious calls to the debugger-generated function, but that these should be compensated for by the simplified address checking sequence.
Whilst such software instrumentation techniques can provide significant flexibility for monitoring values of particular signals, the techniques are relatively complex, due to the instrumentation required to modify the code being executed, and further the additional instructions added to identify particular values of interest adversely impact performance.
The Article “AccMon: Automatically Detecting Memory-Related Bugs via Program Counter-Based Invariants”, by P Zhou et al, Proceedings of the 37th International Symposium on Microarchitecture (MICRO-37 2004), describes a PC-based invariant detection tool that uses a combination of architectural, run-time system, and compiler support to catch hard-to-find memory-related bugs. In the paper, it is observed that, in most programs, a given variable is typically accessed by only a few instructions, and hence based on this observation the paper describes identifying the set of program counter (PC) values that normally access a given key variable, which may for example be a memory object. Then, the paper describes a check look-aside buffer (CLB) whose purpose is to seek to reduce overhead by filtering most valid accesses to monitored objects. Such valid accesses do not need to trigger the monitoring function. The CLB structure is similar to a cache, in that it contains a number of entries, and for each memory address, the CLB is accessed to see if there is a matching entry in the CLB. Rather than each entry in the CLB containing a list of the acceptable set of PC values, a Bloom filter vector is instead identified in the entry, and hence a hit in the CLB will identify a Bloom filter vector that is used to test whether the program counter of the instruction issuing that memory address falls within the acceptable set.
Using the PC value, the identified Bloom filter vector is accessed directly using predetermined bits of the PC value, and if any accessed bit in the Bloom filter vector is zero, it is determined that the PC value does not belong to the acceptable set of PC values for that memory address. Otherwise the element may belong to the set. If it is determined that a bit accessed in the Bloom filter vector is zero, and hence the PC value definitely does not belong to the set, then a trigger is issued to trigger the monitoring function. However, otherwise no trigger is produced and it is assumed that the PC value is acceptable. By the nature of the Bloom filter, the assumption that the PC value is acceptable is not definitive, and it is possible in fact that the PC value may not have been within the acceptable set. Nevertheless, in the specific implementation described in this article, the view is taken that the probability of false positives is sufficiently low that this does not prove a problem.
One problem with the approach described in the above article is that it will not identify all occurrences of values of interest, which in the case of that article are any PC values not within the acceptable set of PC values. Whilst this is considered acceptable having regard to the particular problem that that article is concerned with, it would not generally be considered an acceptable approach when seeking to adopt a more flexible alternative to the earlier described watchpoint mechanisms, where it will typically not be acceptable to allow any watchpoint to be missed. Another problem is that the CLB structure and the Bloom filters within it can only be used to monitor pairs of an instruction address and a data address. It cannot monitor just instruction addresses, data addresses, data values, or values outside of the CPU (e.g., generated by a DMA engine).
Accordingly, it would be desirable to develop an improved hardware technique for enabling watchpoint values to be reliably monitored, but without the inherent limitations associated with typical watchpoint register mechanisms.