Some web sites store many documents and a subset of these documents is accessed frequently. For example, social network web sites can store billions of photographs, and a subset of these photographs can be very frequently accessed for an initial period of time (e.g., soon after the photograph is first “shared” with friends) and then rarely accessed thereafter. Web site operators, e.g., social network web sites, can employ caching techniques to reduce the overhead associated with accessing these documents, but must manage a tradeoff between having very many web site servers to enable quick access to these documents and large caches to reduce load on the servers. These tradeoffs can affect speed and cost. For example, caching can be done using relatively more expensive hardware, e.g., solid state drives (SSDs) instead of disk drives to provide faster speeds.
An example of a caching algorithm is a “least recently used” (LRU) cache. An LRU cache discards the least recently used items first. For example, a photograph that is popularly accessed may remain in a cache for an extended period, whereas a photograph that is accessed rarely may be discarded from the cache. Other advanced caching algorithms also exist. The LRU algorithm can keep an initially posted photograph in the cache and then eventually discard the cache as the photograph becomes less frequently accessed.
Priority queues can be used to implement various advanced caching algorithms. A priority queue is a data type that is like a regular queue or stack data structure, but where additionally each element has a priority associated with it. To implement an LRU algorithm using a priority queue, priorities of items in the queue can be updated when the items are accessed. Thus, an item that is accessed infrequently would eventually have a low priority and so may be discarded from the queue. On the other hand, an item that is accessed frequently will eventually have a high priority and so may remain in the queue longer than other items.
When using SSDs to store data, it is generally desirable to minimize the number of write operations to reduce an undesirable operation referred to as “write amplification.” Write amplification occurs because, unlike with magnetic disks, data in memory must be erased before it is rewritten. This writing and rewriting can cause data to be moved in the SSD, which results in many more writes than the initial write operation. Thus, when an application changes the priority of an item in a priority queue, the SSD may perform multiple writes. It is desirable to reduce the number of writes because SSDs (and memory generally) can have a limited write lifetime. Moreover, it can be advantageous to avoid operations that prevent efficient use SSDs.