Field of the Invention
The present invention relates to the field of data processing and in particular to the monitoring of data processing systems.
Description of the Prior Art
Data processing apparatus are becoming increasingly complex and thus, it is getting more and more difficult to analyse their performance whether for optimisation or for fault finding without extracting and analysing large amounts of data.
Furthermore, data processing apparatus increasingly have multiple processors. These multiple processors often access the same data storage and thus, a problem of race conditions arises where one processor is writing to a stored variable at roughly the same time that another processor is accessing it. Such a problem occurs due to insufficient synchronisation between the processors.
A different but related problem arises when tuning the performance of multiprocessor systems. A programmer needs to understand the performance implications of having two pieces of code running at the same time on different processors due to for example bus contention.
Furthermore, it may be important to be able to check that functions only access a certain portion of memory within their permitted range and do not access memory outside of this range.
One known way to check for race conditions is by dynamic race detection mechanisms. For example the Eraser system described in “Eraser: a dynamic data race detector for multithreaded programs”, ACM transactions on computer Systems (TOCS) vol 15, issue 4, November 1997 pages 391-411 1997 Savage et al. The Eraser system modifies the program it is monitoring to monitor every shared memory reference and to verify that consistent locking behaviour is observed. Runtime modification such as this has two problems associated with it. Firstly it can cause a substantial slowdown (Eraser typically slows a system by 10 to 30 times), thus it cannot be applied in real time systems, and secondly it modifies the software and thus, cannot be used to detect problems caused by interactions with accelerators which are either not programmable (e.g. a DMA controller), or do not have a rich enough instruction set to express the monitoring code or where the code cannot be modified for either practical or legal reasons.
Another way to check for race conditions is by static race detection mechanisms, for example the “Extended Static Checking” (ESC) which is described in HP Labs Tech reports SRC-RR-159 by Detlefs et al. http://www.hpl.hp.com/techreports/compaq-DEC/SRC-RR-159.HTML and in Engels et al.'s MC system described in “Checking System Rules Using System-Specific, Programmer-Written Compiler Extensions” in Proceedings of the 4th Symposium on Operating system Design and Implementation.
Both of these static approaches work by performing a static analysis of the program which aims to detect most race conditions in the program but not to prove that they are not present. MC works by checking that the programmer follows established (manually verified) idioms when acquiring and releasing locks whiles ESC performs a deeper reasoning about programs which can also detect problems such as writing beyond the bounds of an array. Static analysis avoids the overhead associated with dynamic detection but is unable to find some race conditions that dynamic detection can catch. In additions, static analysis tools generally only support a limited set of locking conventions and are unable to check (or report many false positives) when a different locking convention is used.
There are many tools available for profiling a processing system to provide an indication of its performance. For example, many operating systems have tools which produce graphs showing how busy each CPU is, how much memory is in use and how the numbers are varying in time.
These profiling tools can rely on performance counters provided by the system hardware. For example, most modern processors can count events like number of instructions executed or number of cache misses. These counters are typically used in two modes. In one mode (we will call this “one-shot mode”), the programmer (or a tool) modifies the program to turn the counter on at the start of a task and to read the value of the counter at the end of the task. This value is presented to the user or is stored to present to the user at a later point. In the other mode (which we call “periodic sampling”), a periodic interrupt triggers a library to read the performance counter and stream values out to storage for later examination. They may, for example, be displayed as a graph.
In both cases, the processor is required to read the current counter value. This requirement makes the process invasive (the act of measuring may disrupt the results). It also imposes an overhead on the processor responsible for reading the performance counter which, ultimately, limits the granularity and possibly the precision at which measurements can be made.
Compared with one-shot mode, using performance counters in “periodic sampling” mode has the disadvantage that the task one wishes to monitor may start halfway through a sampling period and end halfway through another sampling period resulting in imprecise performance figures. This imprecision can be reduced by reducing the sampling period but, this both increases the degree of invasiveness and as long as there is some imprecision, one cannot say for sure that an event definitely happened during some task or did not happen during that task.
A variation on periodic sampling of performance counters is random sampling. This solves some problems that occur when the sampling period is an exact multiple of the frequency of the events one is measuring but does not reduce the imprecision.
The problems of precision and invasiveness can be tackled by providing hardware to trace all events of interest. An off-chip tool can summarise this trace to produce a very detailed, accurate, graph. The disadvantage of such an approach is that if the events occur at high frequency, a large amount of bandwidth is required to produce the trace, a large amount of storage is required to store it and a large amount of processing to summarise it. This is useful if one wants to examine the events individually but is overkill for performance profiling or for detecting race conditions.
It would be desirable to be able to analyse a complex system during realtime without the need to collect and therefore output and process a very large amount of data.