Caching is a mechanism that accelerates access (i.e., reduces latency) to data on slower storage media by managing a subset of the data in a smaller, faster, and typically more expensive storage medium. As a result, the average latency of the data access will be between the latency of the slower and faster storage. Caches come in many shapes and forms. For example, caches can be embodied in hardware (such as CPU caches) and/or software (such as Memcached). In some cases, caches can also be layered across several storage layers or tiers.
Typically, data access is not uniformly random. Instead, data access often has spatial and/or temporal locality properties. Spatial locality refers to accessing data within a relatively close or similar location to other data that has previously been accessed. Temporal locality refers to the reuse of data that has previously been accessed within a relatively close time period. These locality properties give a measure of predictability to the data that will be accessed, allowing only those data elements predicted to be recalled soon to be stored in the smaller cache.
The ratio of all data accesses that can be served by the cache is called the hit ratio, and is one of the main metrics of success of a cache. Hit ratios can have a significant impact on performance and, as a result, have high economic implications. While there have been many efforts to come up with better ways to determine which items to store in the cache and which to evict to make room for more likely items, the traditional caching policies have been typically based on coarse locality metrics (e.g., oldest items are always evicted). Consequently, data which is more likely to be accessed may be evicted. As such, there are a number of challenges and inefficiencies found in traditional caching policies.