1. Field of the Invention
The present invention relates to a computer program product, system, and method for wait classified cache writes in data storage systems.
2. Description of the Related Art
Data storage systems, particularly at the enterprise level, are usually designed to provide a high level of redundancy to reduce the risk of data loss in the event of failure of a component of the data storage system. Thus, multiple copies of data are frequently stored on multiple systems which may be geographically dispersed. Accordingly, data from a host to be stored in the data storage system is typically directed to a primary device of a primary data storage system at a local site and then replicated to one or more secondary devices of secondary data storage systems which may be geographically remote systems from the primary data storage system. One primary device can have multiple secondary relationships in which data directed to a primary device is replicated to multiple secondary devices.
The process of replicating, that is, copying or mirroring data over to the secondary data storage device can be setup in either a synchronous or asynchronous relationship between the primary data storage device and the secondary data storage device. In a synchronous relationship, any updates to the primary data storage device are typically synchronized with the secondary data storage device, that is, successfully copied over to the secondary data storage device, before the primary data storage device reports to the host that the data storage input/output operation has been successfully completed. In an asynchronous relationship, successful updates to the primary data storage device are typically reported to the host as a successful storage input/output operation without waiting for the update to be replicated to the secondary data storage device.
A storage controller may control a plurality of storage devices that may include hard disks, tapes, etc. A cache may also be maintained by the storage controller, where the cache may comprise a high speed storage that is accessible more quickly in comparison to certain other storage devices, such as, hard disks, tapes, etc. However, the total amount of storage capacity of the cache may be relatively small by comparison to the storage capacity of certain other storage devices, such as, hard disks, etc., that are controlled by the storage controller. The cache may be comprised of one or more of random access memory (RAM), non-volatile storage device (NVS), read cache, write cache, etc., that may interoperate with each other in different ways. The NVS may be comprised of a battery backed-up random access memory and may allow write operations to be performed at a high speed. The storage controller may manage Input/Output (I/O) requests from networked hosts to the plurality of storage devices.
Caching techniques implemented by the storage controller assist in hiding input/output (I/O) latency. The cache may comprise a high speed memory or storage device used to reduce the effective time required to read data from or write data to a lower speed memory or storage device. The cache is used for rapid access to data staged from external storage to service read data access requests, and to provide buffering of modified data. Write requests are written to the cache and then written (i.e., destaged) to the external storage devices.
To guarantee continued low latency for writes, the data in the NVS may have to be drained, that is destaged, so as to ensure that there is always some empty space for incoming writes; otherwise, follow-on writes may become effectively synchronous, which may adversely impact the response time for host writes. Indeed, host writes to a primary data storage system may be intentionally slowed or “throttled” down by intentionally slowing cache write operations on the secondary data storage system caching data mirrored from the primary data storage system to the secondary data storage system. Such throttling of host writes to the primary data storage system may facilitate completely draining a cache on the secondary data storage system in anticipation of loading new programming code on a cluster or other processor of a storage controller of the secondary data storage system.
Task Control Block (TCB) is a task control data structure in the operating system kernel containing the information needed to manage a particular process. Storage controllers may move information to and from storage devices, and to and from the cache (including the NVS) by using TCBs to manage the movement of data. When a write request issues from a host computer to a storage controller, a TCB may be allocated from the operating system code. The TCB is used to maintain information about the write process from beginning to end as data to be written is passed from the host computer through the cache to the storage devices. If the cache is full, the TCB may be queued until existing data in the cache can be destaged (i.e., written to storage devices), in order to free up space. The destage operations may involve the moving of information from cache to storage such as Redundant Array of Independent Disks (RAID) storage and destage TCBs may be allocated for performing the destage operations.
TCBs may be classified on the basis of the task being controlled by the particular TCB. For example, a “background” TCB is a TCB that controls an operation which is not directly related to a host input/output operation. Thus. one example of a background TCB is a TCB which controls a destage operation as a background operation not required as part of a particular host I/O operation. Another example of a background TCB is a TCB which controls a prestage of tracks from storage to cache in which the prestage operation is being performed as a background operation not required as part of a particular host I/O operation.
Another type of TCB is a “foreground” TCB that controls an operation which is typically directly related to a host input/output operation. For example, a foreground TCB may be allocated to perform a destage or stage operation on behalf of a host I/O operation. Thus, a cache miss on a host read typically causes a stage operation controlled by a foreground TCB, to stage one or more tracks from storage to cache to satisfy the host read operation.
Storage controllers frequently employ a safe data commit process which scans the cache directory for modified (often referred to as “dirty”) data to be destaged to secondary storage. Such a scan of the cache directory may be initiated on a periodic basis, such as on the hour, for example. A safe data commit process may also be initiated to completely empty a cache in anticipation of a programming load for a processor which caches data in the particular cache.
For example, prior to loading updated programming code on a secondary storage system that has two processing clusters, one processing cluster is quiesced and the caches for both clusters are completely destaged in a “ratchet” process in which the amount of modified data allowed in each cache is ratcheted downward in a sequence of ratchet operations. The amount of modified data remaining in a cache is compared to a modified data threshold level which specifies a target level of modified data to be permitted in the cache. If the amount of modified data in cache is below the target threshold level, a task control block assigned to write one or more tracks of modified data to cache is dispatched and allocates one or more segments of cache to write the track or tracks to cache for subsequent destaging to storage. Conversely, if the actual amount of modified data in cache is above the target threshold level, a task control block assigned to a cache write operation is queued at the end of a wait queue to wait on the wait queue for a minimum duration of time such as six seconds, for example, instead of being immediately dispatched to allocate segments of cache. Once the enqueued task control block reaches the front of the wait queue, if the task control block has been enqueued on the wait queue for at least six seconds, the task control block is dispatched and allocates one or more segments of cache to write a track of modified data to cache. In this manner, a cache write operation may be made to wait for at least six second before cache segments are allocated to write modified data in the cache. This throttles input/output on the primary data storage system since each write on the secondary data storage system is waiting six seconds.
If the cache write operation is a multi-track write operation, after one track has been written to cache, the task control block for the multi-track cache write operation is re-enqueued at the end of the wait queue to wait again on the wait queue for the minimum duration of time which is typically six seconds as noted above. Once the re-enqueued task control block reaches the front of the wait queue, if the task control block has been re-enqueued on the wait queue for at least another six seconds, the task control block is again dispatched and allocates one or more additional segments of cache to write the next track of modified data of the multi-track cache write to cache.
By delaying cache write operations on the secondary storage system, host writes to the primary storage system are also delayed, thereby reducing or “throttling” the overall amount of host write operations to the primary storage system. As a consequence, the amount of cache write operations to the caches of the secondary storage system are also reduced, thereby facilitating draining or destaging the caches of the secondary storage system in anticipation of a programming code loading. For example, each cache write operation may be made to wait for at least six seconds before cache segments are allocated to write modified data in the cache. This throttles input/output on the primary data storage system since each write on the secondary data storage system is waiting six seconds.
As part of the process of draining the cache entirely, the modified data target threshold level is periodically ratcheted down to the next lower level and host write operations to the primary storage system are throttled down as needed until the caches of the secondary storage system have been completely drained. At that point, host output operations may be blocked, storage ownership changed so that the non-quiesced cluster is assigned (“owns”) the logical subsystems of storage volumes previously owned by the quiesced cluster, and new programming code may be loaded on the quiesced processing cluster of the secondary storage system. Host output operations may then be resumed and operations of the quiesced processing cluster may be resumed as well, permitting the caches of the secondary storage system to refill. The process may be repeated to quiesce and load new programming code on each cluster of the secondary data storage system, fully draining the caches of the secondary storage system each time.
Mirroring operations which mirror data from a primary storage system to a secondary storage system typically suspend mirroring operations if a write operation from the primary storage system to the secondary storage system does not complete within a predetermined maximum time-out period such as twenty seconds, for example. Because the delay imposed on a typical cache write operation on the secondary storage system to facilitate draining the cache is typically substantially less than the twenty second time-out period, suspension of the mirroring operations as a result of throttling operations may frequently be avoided. However, if the cache write operation is a multi-track write operation of a multi-track mirror operation, repeated enqueuing of the task control block on the wait queue for each track of the multi-track cache write operation can cause the suspend time-out period to be exceeded for the multi-track mirror operation, resulting in an undesirable suspension of mirroring operations.