Data storage and retrieval systems are used to store and retrieve information on behalf of one or more host computer system. Such data storage and retrieval systems receive requests from a host computer system to write information to one or more secondary storage devices, and requests to retrieve information from those one or more secondary storage devices. Upon receipt of a write request, the system stores information received from a host computer in a data cache, the cached copy can than be written to other storage devices connected to the system, such as connected nonvolatile storage devices. Upon receipt of a read request, the system recalls one or more data tracks from the one or more secondary storage devices and moves those tracks to the data cache.
When tracks or data are accessed from the storage device they are typically first loaded into cache before being returned to the application or device requesting the data. Because the accessed data remains in cache, a subsequent request for the data can be returned from cache rather than the storage device, which can be substantially faster than retrieving the data from the storage device. Returning data from cache, referred to as a cache hit, improves performance and system throughput because a cache memory provides faster access to data than many nonvolatile storage devices such as tapes, hard-drives, or optical disks. A cache may also provide faster access to a main volatile memory, such as a random access memory (RAM). For instance, many processors include an “on-board” cache that caches data from RAM for the processor to use and subsequently access from the faster cache memory. In both cases, disk caching and memory caching, the cache provides a high speed memory from which data may be returned more efficiently than the storage device or main memory where the data is maintained.
After the cache utilization reaches a certain upper limit, the cache manager will demote data from cache to make room for subsequently accessed tracks. Areas of cache marked as demoted may then be overwritten by new data, making room for data more recently accessed from storage devices.
In some storage systems, a least recently used (LRU) algorithm is used to manage cached data and determine which tracks are demoted. A linked list stores a record of when particular tracks stored in cache were last accessed. When a track is added to cache, a pointer to the track in cache is placed at a top of the LRU linked list indicating that the track has been accessed recently (i.e., the track becomes a most recently used (MRU) track). If a track already in cache is again accessed, then the pointer to that track in cache is placed at the top of the LRU list. When the cache manager determines that data must be demoted or removed from cache to make room for subsequent data accesses, the cache manager will demote tracks whose pointers are at the bottom of the LRU list, representing those tracks that were accessed the longest time ago as compared to other tracks in cache.
Although the above LRU-based caching implementation can be useful in many applications, in remote copy or replication systems, existing LRU algorithms can cause suboptimal cache performance by causing wanted files to be prematurely removed from cache resulting in a poor cache hit ratio. When replicating a volume using, for example, asynchronous peer to peer remote copying (PPRC), after the PPRC transfer to secondary storage is complete, a PPRC agent accesses the track a final time to specify that the track is demotable. This causes the DEMOTABLE bit for the track to be set and the track to be removed from cache shortly after the PPRC transfer is complete. This behavior is designed to remove tracks from cache that were inserted into cache solely for the remote copy process and otherwise wouldn't be stored in cache and are unlikely to be accessed again. In some cases, though, tracks are resident in cache before PPRC begins (e.g., because there were recently accessed by another application). In that case, it is inefficient to remove the tracks from cache after PPRC is complete, as they may be tracks that are regularly accessed by other applications.
As such, there is a need in the art to improve cache management during remote copy or duplication (or any other systems that copy data from a first storage device to a second storage device) to improve performance and data throughput.