Collecting performance data in an operating computer system is a frequent and extremely important task performed by hardware and software engineers. Hardware engineers need performance data to determine how new computer hardware operates with existing operating systems and application programs.
Specific designs of hardware structures, such as processor, memory and cache, can have drastically different, and sometimes unpredictable utilizations for the same set of programs. It is important that flaws in the hardware be identified so that they can be corrected in future designs. Performance data can identify how efficiently software uses hardware, and can be helpful in designing improved systems.
Software engineers need to identify critical portions of programs. For example, compiler writers would like to find out how the compiler schedules instructions for execution, or how well execution of conditional branches are predicted to provide input for software optimization. Similarly, it is important to understand the performance of the operating system kernel, device driver and application software programs. The performance information helps identify performance problems and facilitates both manual tuning and automated optimization.
Accurately monitoring the performance of hardware and software systems without disturbing the operating environment of the computer system is difficult, particularly if the performance data is collected over extended periods of time, such as many days, or weeks. In many cases, performance monitoring systems are custom designed. Costly hardware and software modifications may need to be implemented to ensure that operation of the system is not affected by the monitoring systems.
One method of monitoring computer system performance stores the addresses of the executed instructions from the program counter. Another method monitors computer system performance using hardware performance counters that are implemented as part of the processor circuitry. Hardware performance counters “count” occurrences of significant events in the system. Significant events can include, for example, cache misses, a number of instructions executed, and I/O data transfer requests. By periodically sampling the counts stored in the performance counters, the performance of the system can be deduced.
Data values, such as the values of hardware registers and memory locations, are also useful in developing performance profiles of programs. Value usage patterns indicate which values programs repeatedly use. Such patterns can be used to perform both manual and automated optimizations. One prior art method adds instrumentation code to the program to be profiled and collects data values. However, the instrumentation code increases overhead. In addition, instrumentation based approaches do not allow overall system activity to be profiled such as the shared libraries, the kernel and the device drivers. Although simulation of complete systems can generate a profile of overall system activity, such simulations are expensive and difficult to apply to workloads of production systems.
It is also useful to measure the execution time of load and store instructions in executing computer programs. The load and store instructions access the memory, and the execution time of the load or store instruction is equal to the memory access time or memory latency. Typically, special hardware is required to measure the memory latency. Other methods require that a program being profiled be modified.
Therefore, a method and system are needed to perform value profiling of memory latencies that requires no changes to programs and requires no hardware modifications.
Different levels of memory (for example, L1 cache, L2 cache and main DRAM memory) have different access times. In particular, load instructions may access any one of the levels of memory. Therefore, the method and system should also identify which level of memory was accessed by a load instruction. The method and system should also monitor memory latency without modifying the computer program and allow the monitoring of shared libraries, the kernel and device drivers, in addition to application programs.