This invention relates to a computer system with a hardware monitor, in particular, a performance evaluation system that evaluates the performance from information collected by the hardware monitor, as well as a computer system that restructures the hardware configuration according to the result of the performance evaluation.
The first step of performance tuning in computer systems such as database servers and application servers is to analyze performance bottleneck. In a common method for analyzing performance bottleneck, bottleneck locations are identified comprehensively based on the CPU utilization ratio, the CPU queue length, memory paging, swapping, the I/O throughput, and other data obtained by a performance monitor in an OS or the like, as described in “System Performance Tuning”, 2nd ed. Musumeci, Gian-Paolo D. and Loukides, Mike. O'Reilly Media, Inc., (Japanese translation title: “UNIX System Performance Tuning”, O'Reilly Japan, Inc., 2003), and in “High Performance Client Server: A Guide to Building and Managing Robust Distributed Systems”, Looseley, Chris and Douglas, Frank. John Wiley & Sons Inc., 1998, (Japanese translation title: “256 Rules of Database Tuning”, Nikkei Business Publications, Inc., 1999).
The throughput in transaction processing of a server is calculated commonly by the following formula:(Throughput performance)=(CPU count×CPU frequency×constant)/(CPU execution step count×CPI)
The above constant refers to a number for converting the throughput value into per-unit hour throughput or per-unit second throughput. CPI is the count of execution cycles per instruction of a CPU. The performance can be improved by increasing the CPU count and the CPU frequency while reducing the execution step count and CPI.
CPI is calculated by the following formula:CPI=CPI0+(L1 cache miss ratio−L2 cache miss ratio)×(memory latency of L2 cache)×Kc+(L2 cache miss ratio)×(memory latency of main memory)×Km 
where CPI0 represents the count of execution cycles per instruction when the L1 cache has an infinite capacity, and Kc and Km represent constant values for offsetting the effects of multiple memory access.
As mentioned above, the performance can be improved by increasing the CPU count and the CPU frequency, which are determined by how many CPUs are in the server and how many of the CPUs are put to use. A conventional way to reduce the CPU execution step count is well-thought out coding, or code optimization by a compiler.
A technique of reducing CPI has been proposed which uses a compiler or a CPU's instruction scheduler to increase the degree of parallel instruction execution. However, it is a known fact that some combination of workload and main memory latency can change the performance drastically since the memory latency varies depending on the system operation state and the hardware configuration.
As an alternative to the technique, an instruction scheduling method is being considered which uses measurement results of the memory latency (see U.S. Pat. No. 6,092,180, for example). In this method, the memory latencies of instructions executed by a processor are sampled to record the relation between an instruction and its memory latency. The instruction scheduler changes the order of executing instructions such that an instruction that has a long memory latency is executed before other instructions as much as possible. The instruction scheduler may instead determine where to insert a pre-fetch instruction. This method makes it possible to tune to the memory latency of a server in which the program is run, and to effectively use the CPU time, which is one of hardware resources.