1. Technical Field of the Invention
This invention relates generally to methods of predicting the performance of a cache memory in a computer system, and more specifically, to methods of predicting the performance of a cache memory in a proposed computer system having a proposed computer system architecture and configuration.
2. Description of the Prior Art
Modern computer systems can have a wide variety of computer architectures and configurations. To optimize efficiency, a computer system should have an architecture and configuration that is suitable for an expected load. If the architecture or configuration is excessive for a particular load, some of the computer resources will be wasted. If the architecture or configuration is not sufficiently robust for a particular load, the computer system will not provide adequate performance.
A high performance desktop computer designed for multi-media or graphical applications often has a standard PC architecture, with a relatively large amount of Random Access Memory (RAM), large hard drives, and one or more processors with fairly high clock rates. Multi-media and graphical applications are often computational and/or memory intensive, thereby requiring relatively large amount of memory and processing capability. In contrast, a desktop computer system designed for office use may have a standard PC architecture, but will often have far less RAM, a smaller hard drive and a single processor with less performance. The reduced computer resources of office type systems is appropriate because of the fairly light load of many office applications such as word processing.
For more complex computer systems, such as on-line transaction processing systems, both the architecture and the configuration of the computer system are often designed to accommodate the expected load. The overall throughput of such systems is often dependent on a number of inter-related factors including, for example, the overall architecture of the computer system, the configuration of the computer resources with the architecture, and the expected load and load type.
The architecture of a computer system may include, for example, the location of cache memory, the number of cache memory levels, the location of main memory, the location of processors within the system, the internal bus structure, the I/O structure, as well as other architectural details. The configuration of computer resources within the architecture may include, for example, the size and speed of each level of cache memory, and the number and speed of the processors.
The expected load should be taken into account when designing a computer system, and in particular, when selecting an architecture and/or configuration for the computer system. During the development of a computer system, the developer typically has some idea of the expected load for the system. Often, the expected load for the computer system is estimated by examining the software that will be run on the system. To help design a robust computer system that can efficiently handle the expected loads, it is important for the developer to have some way of evaluating the performance of a proposed computer system based on the expected load, before the system is actually completely developed. This may allow the developer to evaluate many different computer architecture and/or configurations before selecting a particular architecture and/or configuration for the particular application.
A primary way for a developer to evaluate and predict computer system performance is to develop computer performance models. Such models have traditionally been developed using either probabilistic evaluation (analytic models) or discrete event simulation programs (simulation models).
An analytic model is often defined to be a model that accepts moment estimators (such as mean arrival and service times) as its input and, using a closed form or iterative method, produces moment estimators for the desired statistics (such as average wait time). Analytic modeling has proven to be applicable in a wide range of computer system performance evaluation problems, and is the primary method used commercially today.
There are some fundamental drawbacks to analytic modeling. One drawback is that not all discrete systems can be evaluated in this manner. Furthermore, direct measurements have shown that many computer systems seriously violate the underlying assumptions of analytic models. Cache memory systems have presented a particular problem because of the large quantity and diverse nature of today""s cache memory workloads, which create arrival and service distributions which are not only extremely variable, but do not conform to those conventionally assumed for these models. Thus, such models provide severely limited results, which limits the ability of a developer to predict the performance of different cache memory configurations in a proposed computer system. Also, the actual distributions of the analytic modeling parameters often must be simplified, which further compromises the accuracy of the results.
Simulation models are primarily useful in studying computer performance at a high level of detail. A simulation model may be defined to be a model which accepts a set of measured or generated events (such as arrival or service requests) as its input and produces performance data corresponding thereto. Unfortunately, the processing requirements needed to run the simulations is related to the level of detail of such models. Because many of today""s systems are very large and complex, detailed simulation is rarely used commercially because of the inordinate amount of processing time required to produce performance data. Also, and as is the case for analytic modeling, the ability of simulation models to predict the performance of different cache memory configurations is severely limited because of the large quantity and diverse nature of modern day cache memory workloads.
Statistical techniques have also been used to augment and assist conventional analytic and simulation approaches, and also to aid in their evaluation. For example, statistical techniques have been used to provide a sub-model portion of, for example, an overall cache memory simulation model. While such usage of statistical modeling offers the possibility of reducing the complexity and processor requirements of some simulation models, it often does not reduce the simulations times to desirable levels unless the sub-models are oversimplified, which results in reduced accuracy.
Performance projections for processors and memory subsystems are often critically dependent upon a correct understanding of the workloads which are imposed on such systems. In order to accurately predict the performance of a proposed system to assist in selecting among the various design tradeoffs, some prior art systems collect instruction streams (i.e., xe2x80x9ctracesxe2x80x9d) that statistically represent actual workloads. By using traces that represent a fixed workload as input to a system model that allows variations on some hardware parameters, such as the number of processors, some developers hope to predict performance for that workload versus the number of processors.
A limitation of using representative trace data is that the traces can become very large, even for fairly simple instruction streams. A number of methods for minimizing the length of the trace data are disclosed in, for example, U.S. patent application Ser. No. 09/747,050, filed Dec. 21, 2000, entitled xe2x80x9cSystem and Method for High Speed, Low Cost Address and Bus Signal Tracingxe2x80x9d, U.S. patent application Ser. No. 09/745,813, filed Dec. 21, 2000, entitled xe2x80x9cHigh Speed Processor Interconnect Tracing Compaction Using Selectable Triggersxe2x80x9d, and U.S. patent application Ser. No. 09/747,046, filed Dec. 21, 2000, entitled xe2x80x9cCoordination of Multiple Processor Bus Tracings for Enable Study of Multiprocessor Multi-Bus Computer Systemsxe2x80x9d, all of which are assigned to the assignee of the present invention and all of which are incorporated herein by reference. Even using these methods, however, the size of the trace data can become large, particularly for systems that have a relatively large number of processors and/or a relatively large cache memory.
The present invention overcomes many of the disadvantages of the prior art by providing methods and systems for efficiently predicting the performance of a proposed cache memory within a proposed computer system. This is preferably accomplished by first measuring a number of actual cache memory performance values using two or more actual computer systems with known cache memory sizes and known processing capabilities. Each of the average actual cache memory performance values is preferably measured using a common predetermined set of instructions.
Once a sufficient number and variety of average actual cache memory performance values are measured, a predicted average cache memory performance value is calculated for a proposed cache memory size. This is preferably accomplished by extrapolating from selected average actual cache memory performance values.
In some systems, such as multi-processor systems, the predicted average cache memory performance value depends on a number of factors including, for example, the size of the proposed cache memory and the processing capability of the proposed computer system. For example, as the proposed cache memory size increases, the load on the cache memory tends to decrease, which increases the performance of the cache memory. Likewise, as the processing capability of a proposed computer system configuration increases, the load on the cache memory tends to increase, which decreases the performance of the cache memory. Thus, it is often desirable to measure average actual cache memory performance values using actual computer systems that have a variety of cache memory sizes and processing capabilities.
Once a sufficient number and variety of average actual cache memory performance values have been measured, a regression analysis may be performed to identify the sensitivity of the average actual cache memory performance values as a function of, for example, cache memory size and processing capability. One way of performing this regression analysis is to fit the average actual cache memory performance values for each processing capability to a separate curve. Each curve then relates the average cache memory performance value for a particular processing capability to cache memory size.
Under some circumstances, it may be desirable to adjust selected curves so that the curves collectively agree with known theoretical relationships. Without this step, some of the curves may produce a result that does not make intuitive sense. For example, it is known that the curves should collectively show that for a particular cache memory size, the average cache memory performance value should slow as the processing capability of the computer system increases. This makes intuitive sense because the increased processing capability tends to increase the load on the cache memory, which reduces the performance of the cache memory. Therefore, if one or more of the curves predict a result that does not agree with this expected theoretical relationship, the curves should be refit or otherwise adjusted so that the expected theoretical relationships are followed.
To make the curve fitting and curve adjusting simpler and more intuitive, it may be beneficial to perform a data transformation on one or more of the variables. In one illustrative embodiment, both the cache memory size and the number of processors are subject to a log (base 2) data transformation.
In one illustrative embodiment, the cache memory performance values represent cache misses. In this embodiment, the average actual cache memory performance value that is measured for each of the two or more actual computer systems may correspond to the average number of cache Misses Per Instruction (MPI) over the predetermined set of instructions. Under many circumstances, the MPI metric has been found to be a relatively good gauge of performance for a cache memory. When using this metric, the predicted average cache memory performance value for the proposed computer system may correspond to the average number of cache Misses Per Instruction (MPI) at the proposed cache memory size.
Many cache memories are accessed using a number of cache memory request types. Some illustrative cache memory request types include, for example, instruction fetch requests for fetching instructions, data read requests without ownership for fetching data without ownership, data read requests with ownership for fetching data with ownership, ownership requests for requesting ownership, and/or one or more cache management functions or requests.
It has been found that the performance of the cache memory can vary significantly from one cache memory request type to another. Therefore, to increase the accuracy of the predicted cache memory performance, the various contributions for the various cache memory request types may be individually considered. Preferably, an average actual cache memory performance value is measured for each of the cache memory request types using two or more actual computer systems having a variety of cache memory sizes and processing capabilities. Then, a predicted cache memory performance value can be determined for each of the cache memory request types for a proposed cache memory size. This may be accomplished by extrapolating from selected average actual cache memory performance values for corresponding request types.
Cache management functions can also significantly effect cache memory performance. One of the cache management functions that is commonly used in multi-processor systems is a cache hit-return function. A cache hit-return function is issued when a first processor attempts to read a location in a cache memory while a second processor has ownership. The cache hit-return function causes the second processor to return the data to the cache memory so that the first processor can access the location. Another cache management function that is commonly used is a cache flush function. A cache flush function is issued when at least a portion of the data stored in a cache memory is flushed from the cache memory to make room for new data. In some cases, at least a portion of the data that is flushed is written back to a main memory. The cache flush function manages this operation. It is recognized that these are only illustrative cache management functions, and other cache management functions may be used to manage the operation of a cache memory.
To properly gauge the performance of a cache memory, such cache management functions may be separately addressed. This may be accomplished by measuring an average actual cache memory performance value for each cache management function. Then, a predicted cache performance value can be predicted for each cache management function at a proposed cache memory size by extrapolating from the actual cache memory performance values. Like above, it may be desirable to adjust selected predicted cache memory performance values for the various the cache memory request types and/or cache management functions to agree with known theoretical relationships.
Once all of the desired predicted average cache memory performance values have been calculated for a proposed cache memory size, the overall performance of the proposed cache memory can be estimated by combining the various predicted average cache memory performance values. Preferably, the predicted average cache memory performance values for each of the cache memory request types and/or cache management functions are weighted in accordance with the number of times each will be executed during the execution of the predetermined set of instructions.