Caches designed to accelerate data access by exploiting locality are pervasive in modern storage systems. Operating systems (OS's) and databases maintain in-memory buffer caches containing “hot” blocks considered likely to be reused. When an OS needs to access a block, it may first look in a cache. If the block is cached, there is a “hit” and the OS can access it right away. If, however, the block is not in the cache (a “miss”), then the OS must access it using the normal addressing techniques to retrieve the block from slower memory or storage. Server-side or networked storage caches using flash memory are popular as a cost-effective way to reduce application latency and offload work from rotating disks. Virtually all storage devices—ranging from individual disk drives to large storage arrays—include significant caches composed of RAM or flash memory. Since cache space consists of relatively fast, expensive storage, it is inherently a scarce resource, and is commonly shared among multiple clients. As a result, optimizing cache allocations is important, and approaches for estimating workload performance as a function of cache size are particularly valuable.
Cache Utility Curves
Cache utility curves (CUCs) are effective tools for managing cache allocations. Such curves plot a performance metric as a function of cache size. FIG. 1 shows an example miss-ratio curve (MRC, which may also abbreviate miss rate curve), which plots the ratio of cache misses to total references for a workload (y-axis) as a function of cache size (x-axis). The higher the miss ratio, the worse the performance; furthermore, the miss ratio decreases as cache size increases. MRCs come in many shapes and sizes, and represent the historical cache behavior of a particular workload. The MRC example of FIG. 1 illustrates the inherent trade-off: One can reduce the likelihood of misses by making the cache larger, but this leads to greater cost to provide the faster devices that are used for the cache. Instead of evaluating miss ratios as a function of cache size, some other known systems evaluate miss rates and thus miss rate curves, which have analogous properties and can provide similar information to system designers. Both miss ratio curves and miss rate curves are thus different choices for CUCs.
Assuming some level of stationarity in the workload pattern at the time scale of interest, its MRC can be used to predict its future cache performance. An administrator can use a system-wide miss ratio curve to help determine the aggregate amount of cache space to provision for a desired improvement in overall system performance. Similarly, an automated cache manager can utilize separate MRCs for multiple workloads of varying importance, optimizing cache allocations dynamically to achieve service-level objectives.
Weaker Alternatives
The concept of a working set, defined as the set of data accessed during the most recent sample interval, is often used by online allocation algorithms in systems software. While working-set estimation provides valuable information, it doesn't measure data reuse, nor does it predict the magnitude of the performance change that can be expected as cache allocations are varied. Without the type of information conveyed in a cache utility curve, administrators or automated systems seeking to optimize cache allocations are forced to resort to simple heuristics, or to engage in trial-and-error tests. Both approaches are problematic.
Heuristics simply don't work well for cache sizing, since they cannot capture the temporal locality profile of a workload. Without knowledge of marginal benefits, for example, doubling (or halving) the cache size for a given workload may change its performance only slightly, or by a dramatic amount.
Trial-and-error tests that vary the size of a cache and measure the effect are not only time-consuming and expensive, but also present significant risk to production systems. Correct sizing requires experimentation across a range of cache allocations; some might induce thrashing and cause a precipitous loss of performance. Moreover, long-running experiments required to warm up caches or to observe business cycles may exacerbate the negative effects. In practice, administrators rarely have time for this.
Although CUCs are useful for planning and optimization, existing algorithms used to construct them are computationally expensive. To construct an exact MRC, it is necessary to observe data reuse over the access trace. Every accessed location must be tracked and stored in data structures during trace processing, resulting in large overheads in both time and space. One technique due to Mattson, et al., (“Evaluation techniques for storage hierarchies”, IBM Syst. J. 9, 2 (June 1970), 78-117) scans the trace of references to collect a histogram of reuse distances. The reuse distance for an access to a block B is measured as the number of other intervening unique blocks referenced since the previous access to B. The number of times a particular reuse distance occurs is collected while processing the trace, over all possible reuse distances. Conceptually, for modeling LRU (“Least Recently Used”), accessed blocks are totally ordered in a stack from most recent to least recent access. On an access to block B, it:                determines the reuse distance of B as: D=stack depth of B (for first access to B, D=∞),        records D in a reuse-distance histogram, and        moves B to the top of stack.        
Standard implementations maintain a balanced tree to track the most recent references to each block and compute reuse distances efficiently, and employ a hash table for fast lookups into this tree. For a trace of length N containing M unique references, the most efficient implementations of this algorithm have an asymptotic cost of (N log M) time and (M) space.
Given the non-linear computation cost and unbounded memory requirements, it is impractical to perform real-time analysis in production systems. Even when processing can be delayed and performed offline from a trace file, memory requirements may still be excessive. This is especially important when modeling large storage caches; in contrast to RAM-based caches, affordable flash cache capacities often exceed 1 TB, requiring many gigabytes of RAM for traditional MRC construction.