Generally, a microprocessor operates much faster than main memory can supply data to the microprocessor. Therefore, many computer systems temporarily store recently and frequently used data in smaller, but much faster, cache memory. Many computers use multi-level cache memory systems, where there are many levels of cache, e.g., level one (L1), level two (L2), level three (L3), etc. L1 cache typically is closest to the microprocessor, smaller in size, and faster in access time. Typically, as the level of the cache increases (e.g., from L1 to L2 to L3), the level of cache is further from the microprocessor, larger in size, slower in access time, and supports more microprocessors.
Cache memory architecture may vary in configuration, such as cache size, cache line size, cache associativity, cache sharing, method of writing data to a cache, etc. Cache size refers to the total size of the cache memory. The cache memory is configured to store data in discrete blocks in the cache memory. A block is the minimum unit of information within each level of cache. The size of the block is referred to as the cache line size. The manner in which data is stored in the blocks is referred to as cache associativity. Cache memories typically use one of the following types of cache associativity: direct mapped (one-to-one), fully associative (one-to-all), or set associative (one-to-set).
Cache sharing refers to the manner in which data in the blocks are shared. Specifically, L1 cache sharing is the number of processors (physical or virtual) sharing the L1 cache, i.e., the number of L1 caches sharing one L2 cache; and the number of L2 caches sharing one L3 cache, etc. Most program instructions involve accessing (reading) data stored in the cache memory; therefore, the cache associativity, cache sharing, cache size, and cache line size are particularly significant to the cache architecture.
Likewise, writing to the cache memory (cache write type) is also critical to cache architecture, because the process of writing is generally a very expensive process in terms of process time. Cache memory generally uses one of the following methods when writing data to the cache memory: “write through, no-write allocate” or “write back, write allocate.”
The performance of the cache architecture is measured using a variety of parameters, including a miss rate (either load or store), a hit rate, an instruction count, an average memory access time, etc. The miss rate is the fraction of all memory accesses that are not satisfied by the cache memory. There are a variety of miss rates, e.g., intervention, clean, total, “write back,” cast out, upgrade, etc. In contrast, the hit rate is the fraction of all memory accesses that are satisfied by the cache memory. The instruction count is the number of instructions processed in a particular amount of time. The average cache access time is the amount of time on average that is required to access data in a block of the cache memory.
Simulation is a useful tool in determining the performance of a particular cache architecture (i.e., a particular cache size, cache line size, cache associativity, etc.). Simulation of a cache memory may be implemented using a computer system. Thus, given a workload trace (a set of sequences of program instructions linked together, which are executed by microprocessors that emulate sets of typical instructions) and the cache architecture, the performance, e.g., hit/miss rates, of the cache architecture may be simulated.
Simulation of the cache architecture typically involves dealing with certain constraints. For example, for a given set of cache architectural components, including a range of possible measurements for each cache architectural component, the number of permutations to fully simulate the cache architecture may be very large, thus introducing a possible constraint upon cache simulation. Also, there are often additional constraints when using simulation. For example, a trace characterizing each level of the number of processors of interest is required. However, some traces may be absent, or short traces that provide realistic scenarios do not sufficiently “warm-up” large cache sizes, i.e., a trace may not be long enough for the simulation to reach steady-state cache rates. In addition, uncertainty in benchmark tuning is another example of constraints in simulation. Additionally, in the interest of time and cost, usually only a small sample set of cache architectures is simulated.
Once the simulation is performed on the small sample set of the cache architecture, statistical analysis is used to estimate the performance of the cache architectures that are not simulated. The quality of the statistical analysis relies on the degree to which the sample sets are representative of the sample space, i.e., permutations for a given set of cache architectural components. Sample sets are generated using probabilistic and non-probabilistic methods. Inferential statistics along with data obtained from the sample set are then used to model the sample space for the given architectural components. Models are typically used to extrapolate using the data obtained from the sample set. The models used are typically univariate or multivariate in nature. The univariate model is analysis of a single variable and is generally useful to describe relevant aspects of data. The multivariate model is analysis of one variable contingent on the measurements of other variables. Further, the models used to fit the data of the sample set may be smoothed models obtained using a plurality of algorithms.
System model simulators are often used in designing computer system architectures. For example, closed queuing networks may be used to create a logical network that models the handling of memory requests made by microprocessors of a multi-processor computer system. A memory request takes a route through the logical network, where the route taken by the memory request is determined in part by inputs to the system model simulator.
FIG. 1 shows the system model simulator (30), which generates a system model output (32) that may be used to predict performance and to help resolve architectural tradeoffs. An input to the system model simulator (30) is workload characteristics, which is generally a cache simulation output (34). The system model simulator (30) also has other inputs (36), which are often fixed, such as cache and memory latencies, or bus widths.
The cache simulation output (34) includes cache operational parameters in the form of rates per instruction for the multi-level cache hierarchy, including a load miss rate, a store miss rate, a load write back rate, and other rate per instruction parameters of the multi-level cache hierarchy. For example, the cache simulation output for a typical cache memory architecture may have a store miss rate of 0.37% and a load miss rate of 0.71%.
Factors such as cache simulation constraints (e.g., benchmark tuning, trace collection, trace warm-up, etc.) may introduce uncertainties into the cache simulation output (34). For example, traces for simulating different set of inputs for different configurations (e.g., for different numbers of microprocessors or for different cache sizes) are often collected by different experts in potentially different settings. The system model output (32) may be affected by such input uncertainties, i.e., uncertainties included in the cache simulation output (34).