As processor performance continues to outgrow memory capacity and bandwidth, system and application performance has become constrained by the memory subsystem. As the processor community has moved to parallelism to stay on the performance curve, memory capacity and bandwidth are key to keeping the parallel processors and their cores operating efficiently. Promising new technologies, e.g., Phase Change Memory and Flash, have emerged that add capacity at a cheaper cost than conventional DRAM. These new technologies, however, result in added latency and exhibit poor endurance. Systems leveraging these new memory technologies in the memory subsystem will require innovative memory system architectures to gain the benefit of added capacity while mitigating the costs of latency and potential device wear-out.
These sophisticated, high capacity memory systems require long-term application knowledge to effectively evaluate the trade-offs in architecture design decisions. One common method for obtaining the desired application knowledge uses system trace data from a running application to determine read and write patterns of the application. However, these traces can be prohibitively large for even the smallest time scale and often impact the running of the application itself. Furthermore, to understand how an application would leverage a large memory system, the system needs to be monitored or traced for a long time as the application runs. Some designers use modeling and simulation with synthetic memory access patterns to evaluate design decisions. These methods run the risk of not being as accurate as needed.
Traditionally, the architecture design or memory configuration was fixed for a given system. There have not been sufficient capabilities to merit the complexities likely introduced by dynamically reconfiguring the operation of the memory. However, with recent developments enabling more diverse memory subsystems that integrate memory components of different nature into the memory subsystem in combination with more diverse behavior of application workloads, different architectures and memory configurations are now viewed as beneficial. These different architectures include larger memory caches. Technology trends are enabling last level caches that are significantly larger than those that currently exist.
The performance of the memory subsystem directly affects the performance of applications utilizing the memory subsystem. Memory subsystem performance depends on workload parameters and configuration parameters, i.e., architecture, of the memory subsystem. The memory subsystem configuration parameters include e.g., cache size, memory size, line size, block size and associativity. Identifying and quantifying this dependence using performance models helps in understanding the performance of memory subsystem and application performance dependence on memory subsystem configuration parameters. This understanding of dependence and performance provides guidelines for setting memory subsystem configuration parameters for a target application or set of applications.
Traditionally, cache effectiveness has been modeled through trace-driven simulation tools. In addition to the shortcomings of trace-driven simulations as described above, these tools are not up to the task of simulating very large caches. Typical cache sizes modeled using trace driven simulations are of the order of MBytes. Because of the limited length of available traces, the tools cannot capture behavior across long enough periods of time. Apart from the limitations of trace-driven simulations, the performance models that connect memory subsystem performance to configuration parameters are quite limited. These performance models lack an explicit functional characterization and only make available some observations from experiments. Extrapolation from empirical data based on these observations produces a variety of problems including limited extrapolation, usually with respect to single configuration parameter, the requirement for a large number of runs with several different configuration parameters, difficulty in capturing the inter-dependence of different performance metrics, difficulty in capturing fine grained sensitivity of performance metrics to changes in configuration parameters and difficulty in characterizing the robustness of performance to configuration parameter settings.