1. Field of the Invention
The present invention relates to techniques for modeling the performance of computer systems. More specifically, the present invention relates to a method and an apparatus for selecting bases to form a regression model that can be used to extrapolate cache performance.
2. Related Art
To increase the speed of memory references, computer systems typically provide multiple levels of caches, which are used to store frequently used operands and instructions closer to the processor to achieve faster access times. There are a large number of tradeoffs between the cache design variables, such as, cache size, line size, associativity, amount of sharing, etc. Hence, the process of designing a cache requires considering many possible cache designs. Moreover, while optimizing the design of a cache, it is desirable to be able to estimate the performance of different cache designs without actually having to implement the different cache designs.
Performance estimates for various cache designs are typically generated using system performance models driven by trace data from one or more benchmarks of interest. These performance models are typically used to produce specific cache event rates. For example, a performance model can determine how often an instruction generates a read or a write request that misses in a given level of cache memory.
Unfortunately, not all system configurations can be modeled using trace data. For example, only configurations with at most as many processors as in the trace collection system can be simulated. Therefore, it is not possible to simulate future designs which have more processors than the existing systems that were used to generate the trace data. Additionally, configurations with large numbers of processors sharing a cache cannot be simulated accurately. Furthermore, prohibitively long traces are required to simulate large caches due to cache “warming” requirements. Consequently, a system designer is typically required to extrapolate cache rates from smaller systems to obtain performance numbers for systems with larger numbers of processors, larger degrees of sharing and/or larger cache sizes.
Statistical regression models can be used to extrapolate these cache rates. However, the accuracy of such a regression model is highly dependent on the choice of bases for the regression model. Some bases fit the empirical data well, but are relatively unstable in the extrapolation region, whereas other bases are stable in the extrapolation region but fit the empirical data less well.
Hence, what is needed is a method and an apparatus for selecting bases to form a regression model, which strikes a balance between fitting the empirical data and stability in the extrapolation region.