In existing, well-known write caching systems, data is transferred from a host into a cache on a storage controller. The data is retained temporarily in the cache until it is subsequently written (“destaged”) to a disk drive or RAID array.
In order to select the region of data to destage next, the controller firmware uses an LRU (Least Recently Used) algorithm. The use of an LRU algorithm increases the probability of the following advantageous events happening to the data in the cache.    1. Data in the cache may be overwritten with updated data before being destaged, so that write operations from the host result in only one destage operation to the disk, thereby reducing disk utilisation.    2. Data in the cache may be combined with logically-adjacent data (coalesced) to form a complete stride for destaging to a RAID 5 array, thereby avoiding the read-modify-write penalty typically encountered when writing to a RAID 5 array.    3. An attempt by the host to read data which it has recently written may be serviced from the cache without the overhead of retrieving the required data from the disk. This improves the read response time.
Data in the cache must be protected against loss during unplanned events (e.g. resets or power outages). This is typically achieved by including battery backed memory or UPS (uninterruptible power supply) to allow the data to be retained during such events.
However, the provision of such backup power is difficult and expensive so a design decision is often taken such that the controller may not have sufficient power available to retain the contents of all of its cache memory. Consequently, the controller has areas of cache memory which cannot be used for write caching (since the data stored therein would be vulnerable to loss).
Such areas of the cache may, however, be used as a read cache (since this data does not need to be written to the storage device). Such a read cache would be used independently of the write cache.
When a write is received from the host and data is transferred into the cache, it is then known as “dirty” data. Sometime later it is destaged to the disk but may be retained in the cache. It is then known as “clean” data.
If a read command is received from the host for the region of memory corresponding to the cached data then the read command may be satisfied from the clean data in the cache, or a combination of contiguous clean and dirty spans of data.
The clean data in the cache needs to be discarded at some point to allow higher-priority clean data to be retained. The problem is selecting the next clean data entry to discard. This process is known as purging.