1. Field of the Invention
The present invention relates to the field of data storage systems and, more particularly, to coping with faults affecting write caching of data in a data storage system.
2. Description of the Related Art
A typical data storage system includes a controller for controlling operation of the storage system, a cache memory and a mass storage medium, such as hard disk drives. The cache memory serves as a buffer for read and write operations. More particularly, when a host system sends a write request for storing data in the storage system, the data is first stored redundantly in the cache memory, the storage system sends a response to the host, and later the data can be written asynchronously to the mass-storage medium while new requests can continue to be received into the cache memory.
For coping with faults, data storage systems are usually equipped with two caches and two or more redundant controllers. Therefore, where a fault occurs that affects one of the controllers or cache memories, a remaining one can continue to function. Thus, when the host system sends data to be stored by the storage system, the data is stored in both cache memories. Then, if a fault occurs that affects one of the cache memories, the data can be obtained from the remaining cache memory.
Once a fault occurs that affects one of the cache memories, another fault could then occur that affects the remaining cache memory. This could result in the permanent loss of data in cache memory that has not yet been written to the mass storage. Therefore, upon the occurrence of a failure that affects one of the cache memories, a conventional data storage system enters “safe mode” in which the data for each write request is immediately written to the mass storage before the response is sent to the host. This minimizes the amount of data that would be lost in the event a fault affects the remaining cache memory. Unfortunately, operation in safe mode also reduces the rate at which the data storage system is able to process requests. This is because disk access times tend to take longer than writes to cache and the ability to reorder the requests for increasing the efficiency of disk accesses is lost. In some circumstances, the performance can be degraded to the point that the storage system is unable to keep up with demand and has, thus, lost its usefulness.
This problem is exacerbated in some storage systems in which each controller is paired with a cache memory and often located on the same printed circuit board. In this case, a fault affecting any of the controllers or boards is likely to affect at least one of the cache memories, necessitating entry into safe mode. For such systems, the likelihood of having to enter safe mode can be significant.
Therefore, what is needed is an improved technique for coping with faults affecting cache memory in a data storage system. It is to this end that the present invention is directed.