Caching is a mechanism that accelerates data access from storage media by managing a subset of the data in a smaller, faster, and typically more expensive storage medium. Caches come in many shapes and forms. Caches may be embodied, for example, in hardware (e.g., CPU caches) and/or in software (e.g., Memcached). In some cases, caches may also be layered across several storage layers or tiers.
One of the main metrics of successful cache implementation is the cache hit ratio. The cache hit ratio is the ratio of all data accesses that may be served by the cache. For example, a webserver of a social networking service typically stores frequently requested content in a cache memory to serve readily to users upon access requests. If the content requested is not in the cache (i.e., “cache miss” instead of “cache hit”), the content has to be fetched from storage, and a computing delay results. An access management system, operated by the webserver, follows a cache replacement policy (e.g., LRU, CRU, etc.) to determine which content to retain (or replace) in the cache memory, such that the high cache hit ratio may be achieved upon access requests from users.
Many efforts have been attempted to improve cache replacement policies with the goal to retain items that are highly likely to be requested from the cache. Typically, these cache replacement policies (e.g., LRU) have been based on temporal metrics (e.g., oldest items are always evicted to make room for new items). As such, data with a high likelihood of being immediately requested may be evicted in some case, resulting in low cache-hit ratio, computing inefficiency, and poor user experience.