1. Technical Field of the Invention
This invention relates generally to methods of predicting the instruction execution efficiency in a computer system, and more specifically, to methods of predicting the instruction execution efficiency in a proposed computer system having a proposed computer system architecture and configuration.
2. Description of the Prior Art
Modern computer systems can have a wide variety of computer architectures and configurations. To optimize efficiency, a computer system should have an architecture and configuration that is suitable for an expected load. If the architecture or configuration is excessive for a particular load, some of the computer resources will be wasted. If the architecture or configuration is not sufficiently robust for a particular load, the computer system will not provide adequate performance.
A high performance desktop computer designed for multi-media or graphical applications often have a standard PC architecture, with a relatively large amount of Random Access Memory (RAM), large hard drives, and one or more processors with fairly high clock rates. Multi-media and graphical applications are often computational and/or memory intensive, thereby requiring relatively large amounts of memory and processing capability. In contrast, a desktop computer system designed for office use may have a standard PC architecture, but will often have far less RAM, a smaller hard drive and a single processor with less performance. The reduced computer resources of office type systems is appropriate because of the fairly light load of many office applications such as word processing.
For more complex computer systems, such as on-line transaction processing systems, both the architecture and the configuration of the computer system are often designed to accommodate the expected load. The overall throughput of such systems is often dependent on a number of inter-related factors including, for example, the overall architecture of the computer system, the configuration of the computer resources with the architecture, and the expected load and load type.
The architecture of a computer system may include, for example, the location of cache memory, the number of cache memory levels, the location of main memory, the location of processors within the system, the internal bus structure, the I/O structure, as well as other architectural details. The configuration of computer resources within the architecture may include, for example, the size and speed of each level of cache memory, and the number and speed of the processors.
The expected load should be taken into account when designing a computer system, and in particular, when selecting an architecture and/or configuration for the computer system. During the development of a computer system, the developer typically has some idea of the expected load for the system. Often, the expected load for the computer system is estimated by examining the software that will be run on the system. To help design a robust computer system that can efficiently handle the expected loads, it is important for the developer to have some way of evaluating the performance of a proposed computer system based on the expected load, before the system is actually completely developed. This allows the developer to evaluate many different computer architectures and/or configurations before selecting a particular architecture and/or configuration for the particular application.
One measure of a computer system""s performance is the computation time required to process a transaction. This can be derived from the computer system""s minimum latency period and its queuing time, sometimes using an analytical model as described below. The queuing time can be computed from the resource utilization, which, in turn, can be computed from the computer system""s speed in processing the transactions.
Two elements play a key role in determining how efficiently a computer system executes user queries. These include the amount of memory and the processing capability of the system. The amount of memory effects how many instructions are required to retrieve the information necessary to complete a transaction. If, for example, the amount of memory in the computer system is relatively large, the information necessary to complete a transaction is more likely to be present in memory, and therefore it is less likely that the operating system will have to generate and submit additional instructions to access a disk or the like via an I/O channel.
The number and speed of the processors in the computer system can also effect how fast and efficiently a computer system executes user queries. As more processors are added, the instructions necessary to complete a particular transaction are executed faster. However, adding more processors increases the load on the memory, which increases the chance that the requested information will not be in the memory. This can increase the chance that the operating system will have to generate and submit additional instructions to access a disk or the like via an I/O channel.
In addition, as more processors are added, more instruction cycles tend to be dedicated to overhead because of conflicts or other interactions between processors. For example, as more processors are added, more interrupts, dispatches, conflicts resulting in spin/lock loops, I/O locking conflicts, etc. are typically encountered, all of which reduce the efficiency of the computer system. Thus, there is an interplay between the amount of memory and the processing capability of a computer system that effects how fast and efficiently the computer system can execute user queries.
A primary way for a developer to evaluate and predict computer system performance is to develop computer performance models. Such models have traditionally been developed using either probabilistic evaluation (analytic models) or discrete event simulation programs (simulation models).
An analytic model is often defined to be a model that accepts moment estimators (such as mean arrival and service times) as its input and, using a closed form or iterative method, produces moment estimators for the desired statistics (such as average wait time). Analytic modeling has proven to be applicable in a wide range of computer system performance evaluation problems, and is the primary method used commercially today.
There are some fundamental drawbacks to analytic modeling. One drawback is that each analytical model is based on actual design specifications of a computer system. Thus, the computer system must already be sufficiently designed before any meaningful simulations can be performed. In addition, direct measurements have shown that many computer systems seriously violate the underlying assumptions of analytic models, and the actual distributions of the analytic modeling parameters must often be simplified; both tending to compromise the accuracy of the results. Finally, significant time and expense are required to develop an analytical model, which as indicated above, is typically designed for a particular computer system and configuration. To calculate the performance for another computer system or configuration, the analytical model must typically be redesigned to fit the characteristics of the new system. This can be a time consuming, tedious and expensive task.
Simulation models are primarily useful in studying computer performance at a high level of detail. A simulation model may be defined to be a model which accepts a set of measured or generated events (such as arrival or service requests) as its input and produces performance data corresponding thereto. Unfortunately, the processing requirements needed to run the simulations are related to the level of detail of such models. Because many of today""s systems are very large and complex, detailed simulations are often impractical because of the inordinate amount of processing time required to produce performance data.
Statistical techniques have also been used to augment and assist conventional analytic and simulation approaches, and also to aid in their evaluation. For example, statistical techniques have been used to provide a sub-model portion of, for example, an overall cache memory simulation model. While such usage of statistical modeling offers the possibility of reducing the complexity and processor requirements of some simulation models, it often does not reduce the simulation times to desirable levels unless the sub-models are oversimplified, which results in reduced accuracy.
Performance projections for processors and memory subsystems are often critically dependent upon a correct understanding of the workloads which are imposed on such systems. In order to accurately predict the performance of a proposed system to assist in selecting among the various design tradeoffs, some prior art systems collect instruction streams (i.e., xe2x80x9ctracesxe2x80x9d) that statistically represent actual workloads. By using traces that represent a fixed workload as input to a system model that allows variations on some hardware parameters, such as the number of processors, some developers hope to predict performance for that workload versus number of processors.
A limitation of using representative trace data is that the traces can become very large, even for fairly simple instruction streams. A number of methods for minimizing the length of the trace data are disclosed in, for example, U.S. patent application Ser. No. 09/747,050, entitled xe2x80x9cSystem and Method for High Speed, Low Cost Address and Bus Signal Tracingxe2x80x9d, U.S. patent application Ser. No. 09/745,813, entitled xe2x80x9cHigh Speed Processor Interconnect Tracing Compaction Using Selectable Triggersxe2x80x9d, and U.S. patent application Ser. No. 09/747,046, entitled xe2x80x9cCoordination of Multiple Processor Bus Tracings for Enable Study of Multiprocessor Multi-Bus Computer Systemsxe2x80x9d, all of which are assigned to the assignee of the present invention and all of which are incorporated herein by reference. Even using these methods, however, the size of the trace data can become large, particularly for systems that have a relatively large number of processors and/or a relatively large cache memory.
The present invention overcomes many of the disadvantages of the prior art by providing methods and systems for efficiently predicting the instruction execution efficiency of a proposed computer system. This is preferably accomplished by first measuring or otherwise obtaining actual instruction execution efficiency values for two or more actual computer systems. Each of the actual instruction execution efficiency values may correspond to, for example, the number of instructions that are executed per unit of work, which can be derived from the number of instructions that are executed during a predetermined software code section or portion thereof.
Each of the actual computer systems preferably has a different allocation of resources, and in particular, a different allocation of resources of a first resource type and a second resource type. In one illustrative embodiment, the resources of the first resource type may correspond to memory (e.g., cache and/or main memory) and the resources of the second resource type may correspond to processing capability (e.g., number and/or speed of the processors). Each of the actual instruction execution efficiency values is preferably measured while executing a predetermined set of software codes, such as a TPC benchmark.
As noted above, two elements play a key role in determining how efficiently a computer system executes user queries. These include the amount of memory and the processing capability of the system. The amount of memory effects how many instructions are required to retrieve the information necessary to complete a transaction. If, for example, the amount of memory in the computer system is relatively large, the information necessary to complete a transaction is more likely to be present in memory, and therefore it is less likely that the operating system will have to generate and submit additional instructions to access an external hard disk or the like via an I/O channel.
The number and speed of the processors in the system can also effect how fast and efficiently a computer system executes user queries. As more processors are added, the instructions that are necessary to complete a particular transaction are executed faster. However, adding more processors increases the load on the memory, which increases the chance that the requested information will not be in the memory. This can increase the chance that the operating system will have to generate and submit additional instructions to access a disk or the like via an I/O channel.
Also, as more processors are added, more instruction cycles tend to be dedicated to overhead because of conflicts or other interactions between processors. For example, as more processors are added, more interrupts, dispatches, conflicts resulting in spin/lock loops, I/O locking conflicts, etc. are typically encountered, all of which reduce the efficiency of the computer system. Thus, there is an interplay between the amount of memory and the processing capability of a computer system that effects how fast and efficiently the computer system can execute user queries.
To assess this interplay, actual instruction execution efficiency values are preferably measured using a sufficient number of actual computer systems that have a sufficient variety of resource allocations to create a statistically significant pool of information or data. Using this pool of information or data, a predicted instruction execution efficiency value can be calculated for a proposed computer system having a proposed allocation of resources of the first resource type and the second resource type. This is preferably accomplished by performing a multi-variant regression analysis of selected actual instruction execution efficiency values in the pool of information or data The multi-variant regression analysis preferably identifies a first contribution or function for the first resource type and a second contribution or function for the second resource type. The first contribution or function and the second contribution or function can then be used in conjunction with the proposed resource allocation of the first resource type and the second resource type, respectively, to predict the instruction execution efficiency value of the proposed computer system.
Because the instruction execution efficiency value of a computer system is typically dependent on both the size of the memory and the load on the memory, it may be desirable to perform a data transformation on the memory size of the actual computer system before performing the multi-variant regression. One illustrative data transformation includes a memory load value divided by the memory size. The memory load value may be related to, for example, the number of transactions per minute produced by the actual computer system, the number and/or speed of the processors used in the actual computer system, or some other metric related to the processing capability of the actual computer system.