1. Field of the Invention
The present invention relates to a method, system, and program for restoring data in cache.
2. Description of the Related Art
Computing systems often include one or more host computers (“hosts”) for processing data and running application programs, direct access storage devices (DASDs) for storing data, and a storage controller for controlling the transfer of data between the hosts and the DASD. Storage controllers, also referred to as control units or storage directors, manage access to a storage space comprised of numerous hard disk drives connected in a loop architecture, otherwise referred to as a Direct Access Storage Device (DASD). Hosts may communicate Input/Output (I/O) requests to the storage space through the storage controller.
To maintain availability in the event of a failure, many storage controllers known in the prior art provide redundant hardware clusters. Each hardware cluster comprises a processor complex, cache, non-volatile storage (NVS), such as a battery backed-up Random Access Memory (RAM), and separate power supply to provide connection paths to the attached storage. The NVS in one cluster would backup write data from the cache in the other cluster so that if one cluster fails, the write data in the cache of the failed cluster is stored in the NVS of the surviving cluster. After one cluster fails, all Input/Output (I/O) requests would be directed toward the surviving cluster. When both clusters are available, each cluster may be assigned to handle I/O requests for specific logical storage devices configured within the physical storage devices.
In the event of a failure of one of the clusters, a failover will occur to have the surviving cluster handle all I/O requests previously handled by the failed cluster so that access to the storage system managed by the storage controller remains available. As part of the failover process, the surviving cluster remains online and all the cached data for the failed cluster, i.e., the write data to the logical devices assigned to the failed cluster that was backed up in the NVS of the surviving cluster, is copied (also known as restored) from the NVS in the surviving cluster to the cache of the surviving cluster. Thus, after failover, the cache and NVS in the surviving cluster buffer writes that were previously directed to the failed cluster. During this restore/failover process, host I/O requests directed to logical devices previously assigned to the failed cluster are delayed until all writes to such logical devices in the NVS in the surviving cluster are restored/copied to the cache in the surviving cluster.
This restore process can take thirty seconds or more. Such a delay is often deemed unacceptable for storage controllers used in critical data environments where high availability is demanded. For instance, the systems used by large banks or financial institutions cannot tolerate delayed access to data for periods of several seconds, let alone thirty seconds or more.
For these reasons, there is a need in the art for improved techniques for handling data recovery in a manner that minimizes the time during which I/O requests to the storage are delayed.