The following copending, commonly-assigned patent applications describe a mirrored write-back cache system used with the present invention and are hereby incorporated by reference.
1. xe2x80x9cSimultaneous Mirror Write Cachexe2x80x9d invented by Tom Fava et al, U.S. patent application Ser. No. 08/671,154 filed Jun. 28, 1996, now U.S. Pat. No. 5,802,561.
2. xe2x80x9cEnabling Mirror, Non-Mirror and Partial Mirror Cache Modes In a Dual Cache Memoryxe2x80x9d invented by Susan Elkington et al, U.S. patent application Ser. No. 08/671,153 filed Jun. 28, 1996, now U.S. Pat. No. 5,974,506.
3. xe2x80x9cControls For Dual Controller Dual Cache Memory System invented by Clark Lubber et al, U.S. patent application Ser. No. 08/668,512 filed Jun. 28, 1996, now U.S. Pat. No. 6,279,078.
1. Field of the Invention
This invention relates to warmswap of cache modules in a mirrored cache system. More particularly, the invention relates to replacing memory modules while continuing to operate the mirrored cache system.
2. Description of the Related Art
For some time now, storage systems have been designed to remain in operation during the repair of single module failures in the storage system. In some peripheral storage systems, the system has been designed to permit a hotswap where, for example, a disk drive may be pulled and replaced with no preparatory operations by the storage system. In memory storage systems, more typically a warmswap procedure is followed. In a warmswap, the storage system remains operative during replacement of a module, but a predetermined procedure is invoked to prepare the storage system for replacement of the module. In effect, the storage system is quiesced (placed in a lower state of operative capacity), the failed module is replaced, and the storage system is brought back up to full operative capacity.
With the advent of mirrored cache systems, and particularly mirrored write-back cache systems, a new set of problems was created for maintaining operation of the cache storage system while replacing a component or module in the system. In mirrored cache systems, the data in cache is duplicated in separate memory modules. Thus, it should be possible to replace one memory module with little, or no, degradation of performance of the cache memory access time. However, the difficulty arises in protecting data in the good memory module while swapping the bad memory module. Further, once the bad memory module is replaced, the new memory module must be brought back up to the same level of data integrity as the good memory module to effectively heal the mirrored cache system.
In accordance with this invention, the above problems in replacing modules in a mirrored cache system have been accomplished by disabling mirrored write operations in the cache system; testing the replacement memory module in the cache system; and restoring the mirrored data in the cache system. The restoring operation is accomplished by first quiescing write operations to stop writing data in the cache system not backed up in non-volatile data storage. Then data is copied from surviving memory modules to the replacement module, and the cooperative interaction of the surviving memory modules with the replacement memory module is validated. The validating operation verifies the cache modules are ready and the controllers are synchronized. After validation the quiesced write operations are un-quiesced, and mirrored-write operations for the cache system are enabled.
As a further feature of the invention during recovery of the cache system write-back are disabled by switching the write operations to the cache system from write-back operations to write-through operations where all cache write operations are also written to non-volatile storage.
In another embodiment of the invention the cache system has two cache modules and two controllers, each cache module has two quadrants of storage space so that a mirrored write operation writes the same data to one quadrant in one cache module and a paired quadrant in the other cache module. The method of recovering the cache system begins by disabling the mirrored write operations and enabling writing to only the good cache. The failed cache module is replaced with a new cache module while continuing to write to the remaining good cache module. The new cache module is tested in the cache system, and the mirrored write operations is restored to both the remaining good cache module and the new cache module. The write-back operations are disabled and write-through operations are enabled during recovery of the system. RAID write operations are quiesced to prevent writing data to the cache system that is not backed-up in non-volatile storage. The metadata from both quadrants in the good cache module is copied to the assigned paired quadrants in the new cache module. After verification that all quadrants are operating correctly and the controllers are synchronized, the write-back and RAID write operations are enabled, and mirrored-write operations to the restored cache system are enabled.
As another feature of the invention, data copying from a good cache module to the new cache module, the releasing of quiesced write operations and the enabling of mirrored-write operations are all performed sequentially for each volume of data in the good module.
The great advantage and utility of the present invention is the extraordinary reliability of a cache system in which the invention is used. If the cache system continues to operate in write-back mode, while the system is being recovered, the change in performance of the system during replacement of the module is barely perceptible to the user. The foregoing and other features, utilities and advantages of the invention will be apparent from the following more particular description of a preferred embodiment of the invention as illustrated in the accompany drawings.