1. Field of the Invention
The present invention relates to a method and system for caching data writes in a storage system and, in particular, maintaining information on the data writes for data recovery purposes.
2. Description of the Related Art
Current storage systems include a cache which receives modified data, i.e., data writes, and a battery backed-up random access memory (RAM), also referred to as a non-volatile storage unit (xe2x80x9cNVSxe2x80x9d), to backup the modified data maintained in cache. In this way, if the system fails, a copy of modified data may be recovered from NVS. For instance, a storage controller, including a processor, cache and NVS, receives data writes from host systems, such as a mainframe computer, server or other computer system, intended for a Direct Access Storage Device (DASD) managed by the storage controller. In a cache fast write operation, the storage controller receives a data write and writes the received data to cache without writing a copy to the NVS. In a DASD Fast Write operation, the storage controller writes the received data to both the cache and NVS.
During destaging operations, the storage controller writes the modified data in the cache to DASD. If modified data was also written to NVS in a DASD fast write operation, then the storage controller would remove the copy of the destaged data from NVS. Thus, with cache fast write operations, the storage controller risks losing data stored in cache if there is a system failure. Whereas, with DASD fast write operations, if there is a failure, the modified data may be recovered from NVS. Current storage controller systems that utilize the DASD and cache fast write operations include the International Business Machines Corporations 3990 Storage Controller, described in IBM publication, xe2x80x9cIBM 3990 Storage Control Reference (Models 1, 2, and 3), IBM document no. GA32-0099-06 (Copyright IBM 1988, 1994), which publication is incorporated herein by reference in its entirety.
Pinned data is data that the storage controller cannot destage because of a failure from the DASD, track format errors or from a failure to read both the cache and the NVS storage copies. Both DASD fast write and cache fast write data can be pinned. Pinned data cannot be removed and the space it occupies cannot be used again until either the problem is fixed, or a host program discards the data or forces the cache to be unavailable. The storage controller attempts to destage pinned data when the track is accessed, or a not-ready-to-ready interrupt is received for the device. Once all the pinned data for a device is cleared, the suspended fast write operations may be resumed. The service representative may have to fix the fault before the data can be destaged.
To preserve data integrity, some current systems utilize the DASD fast write procedure to backup modified data in NVS in case the cache copy of the modified data is lost. This operation of storing modified data in both cache and NVS can consume significant bandwidth, storage, and processor resources to carry out both copy operations. To avoid the costly backup operations to both cache and NVS, certain systems only store modified data in cache. Some systems, only store data in cache, but provide a backup battery to provide cache with power for a brief period of time should the system enter a failover mode. During this brief time that the cache is powered by the backup battery, modified data may be destaged from cache. These systems that only store data in cache risk jeopardizing data integrity in the event that modified data is lost when the battery backing up cache expires, the cache fails or the system shuts-down. Data integrity is jeopardized in such cache-only backup when the modified data is lost in cache because the system will have no knowledge of which data was modified. Consequently, the system could return stale data from storage in response to a read request.
To provide an improved data storage system, preferred embodiments disclose a system and method for caching data. A processor receives data from a host to modify a track in a first storage device. The processor stores a copy of the modified data in a cache and indicates in a second storage device the tracks for which there is modified data in cache. During data recovery operations, the processor processes the second storage device and data therein to determine the tracks for which there was modified data in cache. The processor then marks the determined tracks as failed to prevent data at the determined tracks in the first storage device from being returned in response to a read request until the failure is resolved.
Such embodiments conserve system resources because modified data in cache does not have to be backed-up in a second storage device. Moreover, data integrity problems are avoided because in the event of a system failure and loss of the modified data in cache, the processor has information stored in the second storage device on those tracks having modified data in cache before the failure. The processor will not return stale data from the first storage device until the modified data in cache that was lost when the system failed is recovered.
In further embodiments, the processor may determine whether the received data is sequential data or random data before indicating in the second storage device the tracks having modified data in cache. In such case, the processor indicates the tracks having modified sequential data in the second storage device. Further, the processor may store a copy of modified random data in the second storage device. These further embodiments save bandwidth by avoiding the need to make a second copy of sequential data updates, which can consume a significant amount of bus bandwidth. Moreover, space in the second storage device is further preserved because sequential data updates could flush the second storage device of random data.
In additional embodiments, the processor may handle a partial failure in a storage system by scanning the cache, in response to detecting a partial failure, to determine tracks for which there is modified data stored in the cache. The processor then stores in the second storage device information indicating the tracks having modified data in cache and schedules the destaging of the modified data from the cache to the first storage device. The processor is further capable of receiving and processing read/write requests directed to the first storage device before all the modified data is destaged from cache.
This additional embodiment provides further advantages because in the event of a partial failure, the processor will continue to process read/write transactions while modified data is being destaged from cache. At the same time, data integrity is assured because the second storage device keeps track of modified data in cache. Thus, in the event of a subsequent failure to the system that causes a loss of modified data in cache, the system will maintain in the second storage device information on modified tracks. Further, some of the modified information may have been destaged as a result of the destaging operations. When the system comes back online, the system will have knowledge of which tracks were modified and not destaged. The system may use this information to avoid returning data from the first storage device that is stale as a result of the failure to destage all the modified data from cache.