The present invention relates to a storage volume reservation system and method. More particularly, the present invention relates to a storage volume reservation system and method for maintaining cache coherency amongst a plurality of caching controllers in a data storage system during a warm start cache recovery.
In FIG. 1 there is illustrated a typical computer system 100 having a host computer 102 with a processor 104 and associated memory 106, one or more data storage subsystems 108, 110 each having a plurality of hard disk drives 112, 114, first and second storage controllers 116, 118 coupled between the host computer and the storage subsystem by interfaces and communications links of conventional type, such as for example, busses or network connections. When the first and second controllers 116, 118 are caching storage controllers (described below), each controller includes a cache memory 120, 122 that serves as intermediate storage. Usually the cache memory is fast random access memory (RAM) of which there are many types.
Increasingly, there is a need to provide access to stored information or data on hard disk drives (or other storage devices) from a plurality of host servers and to also permit the data stored on any particular storage device to be accessed through alternative device controllers. Providing access to the data from multiple hosts eliminates the need to store the data at more than one location (though the data may still be redundantly stored using known mirroring or Redundant Array of Independent Disk (RAID) techniques) and in theory assures that the identical data can be accessed by interested parties. Providing multiple access to a storage device through a plurality of controllers, provides redundant access to the device from an alternate (or second) controller so that the data remains accessible in the event that the first controller fails.
Although providing access to storage devices through multiple controllers is desirable, such a configuration may present data consistency problems. Data consistency refers to all controllers providing visibility to one identical copy of the data. Data consistency can be provided through data synchronization or data coherency or both. Data coherency refers to maintaining a consistent copy of the data in each of the controllers caches. Data synchronization refers to keeping the data in the storage controller""s cache the same as that in the storage device.
A storage controller is a device which is capable of directing data traffic from the host system to one or more non-volatile storage devices. It may or may not have an intermediary cache to stage data between the non-volatile storage device and the host system. A caching controller (or caching storage controller) is a device which is capable of directing the data traffic from a host system to one or more non-volatile storage devices which uses an intermediary data storage device (the cache memory) to stage data between the non-volatile storage device and the host system. In general, the intermediary storage device is built out of RAM to allow a quicker access time to the data. Furthermore, it provides a buffer in which to allow exclusive-or (XOR) operations to be completed for RAID 5 operations. Multiple active controllers are defined as a collection of storage controllers or caching storage controllers which work in a cooperative manner with each other. They provide the ability for recovering from a controller failure by allowing multiple paths to a storage volume. The storage volume is a contiguous range of randomly accessible sector of data. For practical purposes, the sector numbering starts at 0 and goes to N, where N is the total number of sectors available to the host system. A data extent is a range of data within a storage volume delineated by a starting sector and an ending sector. The storage volume is broken up into a number of data extents which are not required to be of equivalent sizes, but may not overlap. These concepts are used in the discussion of the background and the detailed description of embodiments of the invention, and apply to both.
Caching storage controllers that work independently of one another to store information or data to a secondary storage unit, such as a hard disk drive, or tape unit, are conventionally available. There are also caching storage controllers that work with one or more other controller(s) to provide multiple controller access to a secondary storage unit and provide a fault tolerant environment. If two controllers are simultaneously providing access to a common set of storage devices and each is able to take over the other""s functionality in the event of a failure, then those controllers are referred to as active-active or dual-active controllers.
Computer system configurations involving one or more host computers and having two or more controllers that use cache technologies, with access to a storage device through any of the controllers, should desirably provide some mechanism of ensuring that the cache data in each controller is always correct. Unfortunately in conventional systems they may not. Controllers using the SCSI command set could use two commands that are provided in that command set, the xe2x80x9cReserve LUNxe2x80x9d and xe2x80x9cRelease LUNxe2x80x9d commands, where LUN is an abbreviation for Logical Unit Number. (SCSI commands, including the Reserve LUN and Release LUN commands, are described in standard references including SCSI-2 Small Computer System Interface-2 ANSI X3.131:1994: which is incorporated herein by reference.) The host computer, especially one operating in a cluster environment, could use these two commands to reserve all accesses to the entire LUN.
Unfortunately, not all host computers use these SCSI commands. Furthermore, the Reserve LUN and Release LUN commands do not provide for reservation of a portion of a storage volume because they apply to the entire storage volume.
The following description is provided relative to FIG. 2, and sets forth the problems with data synchronization between controllers which maintain local cached copies of data. This example details one set of host transactions that could cause data integrity problems (data consistency and data synchronization problems). The data consistency problem is brought about by the fact that each controller""s cache operates independently.
With reference to the illustration in FIG. 2, there is shown a portion of computer system 130, including host computer 132 having a processor or central processing unit (CPU) 134, first controller (controller xe2x80x9cAxe2x80x9d) 136, second controller (controller xe2x80x9cBxe2x80x9d) 138, a storage subsystem 140 including at least one backing storage volume 142. Each of the two controllers 136, 138 further separately include first cache (Cache xe2x80x9cAxe2x80x9d) 144 and second cache (Cache xe2x80x9cBxe2x80x9d) 146 for caching data retrieved from backing storage volume 142. Generic techniques for controller caching are known in the art and not described further here. Backing storage volume 142 is coupled to each of first and second controllers 136, 138 by storage interface channels 148, 150, and the host computer processor (CPU) 134 is coupled to the controllers by CPU-to-storage device interface 152 The interface 152 may typically be implemented as a Personal Computer Interface (PCI), parallel SCSI, fibre channel, or IEEE-1394 (fire-wire) interface using a storage, file system, or other communications protocol. In like manner, the controller-to-storage device interfaces 148, 150 may typically be implemented using the same set of interfaces and protocols as just described for interface 152. A logical unit number (LUN) is assigned or otherwise associated with each backing storage volume 140. The relationship between physical devices or portions thereof and logical devices is known in the art and not further described here.
In this configuration, if data is written to a logical unit, such as backing storage volume 142A through first controller 136, the data is properly retained in the first controller""s cache, that is within cache 144. If data is subsequently written to logical storage unit 142A through second controller 138, the newly written data in backing storage volume 142A matches the data in the second controller""s cache 146, but the information in the first controller""s cache 144 will not have been updated, and (if the newly written data is different from the original data) no longer matches the data written to the backing storage volume 142A. If a request to read the data is made through first controller 136, the data will be read from cache 144 according to standard data caching and retrieval practices to minimize backing storage volume access, and the wrong information will be returned to the requestor. The data is said to lack coherency between different locations (that is between one or more of the backing storage volume, cache 144, and cache 146), and is out of temporal synchronization as a result of the time-order of events involving the read, write, and caching operations.
Stating the problem by way of example in somewhat more concrete terms, in a system with two controllers 136, 138 attached to the same CPU/storage interface and sharing access to a backing storage volume 142, as illustrated in FIG. 2, host computer 132 writes data pattern xe2x80x9cAAAAxe2x80x9d to the backing storage volume through first controller 136. First controller 136 retains this information in its data cache 144 so that future requests for the data can be fulfilled without having to access the disk backing storage volume 142A. At a later time, the host computer writes the data pattern xe2x80x9cBBBBxe2x80x9d to backing storage volume 142A at the same location the xe2x80x9cAAAAxe2x80x9d data pattern had been stored, but now the write operation is directed through the second controller 138 as illustrated in FIG. 3. First controller 136 still has the xe2x80x9cAAAAxe2x80x9d data pattern stored in its cache 144, but second controller 138 has the xe2x80x9cBBBBxe2x80x9d data pattern stored in its cache 146. The data in cache 144 (xe2x80x9cAAAAxe2x80x9d) and the data in cache 146 (xe2x80x9cBBBBxe2x80x9d), each supposedly representing the identical data, no longer match and are incoherent.
The correct data pattern on backing storage volume 142A (xe2x80x9cBBBBxe2x80x9d) is the later data pattern also stored in cache 146 (xe2x80x9cBBBBxe2x80x9d), but if the host computer 132 attempts to read the information from backing storage volume 142A through first controller 136, first controller 136 will, using conventional techniques, be unaware of any controller 138 operations, and in particular will be unaware that a write operation has altered the data on the backing storage volume. Lacking knowledge that the data has changed, first controller 136 will access it""s own cache 144 to retrieve the data, and erroneously return that data pattern (xe2x80x9cAAAAxe2x80x9d) rather than the correct data pattern (xe2x80x9cBBBBxe2x80x9d) to the requesting host computer 132.
One technique for overcoming the data consistency problem described above is a storage volume reservation system and method as described in co-pending U.S. patent application Ser. No. 09/325,033, now U.S. Pat. No. 6,247,099, which is hereby incorporated by reference. The storage volume (or storage LUN) reservation system for active controllers in an environment allows data access through two or more separate caching controllers. The inventive structure and method maintains a xe2x80x9creservation tablexe2x80x9d (such as a LUN reservation table) that is always consistent on each of the plurality of controllers. This structure and method also provide the capability of explicitly reserving storage volumes using any current storage volume (or LUN) reserve commands, or implicitly using a write operation. The inventive structure and method also provide the capability of invalidating a controller""s cache based on acquiring new reservation.
The storage volume reservation system and method provide that each controller is not required to reserve a storage volume in order to perform an update to that storage volume. An explicit reservation may be made through the use of Storage Volume Reserve commands, while an implicit reservation is made whenever a write operation requires that the particular controller obtain a reservation. Implicit reservations may occur for example when an alternate controller already owns the reservation. The reservation may also be obtained implicitly when the controller is required to perform a read operation, and the alternate controller already owns the reservation. This reservation requirement is imposed in order to ensure that the alternate controller""s cache contains no data (dirty data) that has not been synchronized with the storage volume drive.
The reservation process is synchronized between all of the controllers in the system in order to maintain reservation table coherency. All updates to the reservation table are propagated to the alternate controllers to maintain reservation table coherency. This procedure allows most reads and writes to be performed with minimal overhead. An I/O operation to a storage volume that is reserved by that controller only needs to check for that ownership before processing the I/O operation request.
The reservation process also benefits from cache flushing and cache invalidating in some circumstances to maintain data integrity. Obtaining a reservation to a storage volume or portion there of that is not owned by any controller is straightforward and only requires an update to the reservation table, and it""s propagation to all alternate controllers. Obtaining a reservation to a storage volume or portion thereof currently owned by an alternate active controller further requires that the alternate controller flush and invalidate all cache data associated with that storage volume. Releasing a reservation is not required but may optionally be performed using storage volume release command.
The problem with the storage reservation system and method described above is what to do with cache data preserved by a battery backup unit (BBU) during a warm start cache recovery. In order to maintain data coherency, the dirty cache lines need to be written out to the disk. In addition, each controller needs access to the area of the storage volume required to write out the caches line that it owns such that the access is coordinated in order to maintain data integrity. Consequently, a storage volume reservation must be re-acquired for each dirty cache line. In fact, the entire process of storage volume reservation needed to maintain cache coherency must be repeated before any new input output (I/O) processes can be accepted. In addition, any host initiated I/O that occurs before the dirty data from the warm start is written out must have it""s access to the storage volume coordinated so that data integrity is maintained. Any host I/O processes that were running power to the power off are discarded, while any rebuild operations in process are either discarded or restarted depending on whether the automatic rebuild feature is enabled.
Therefore, there remains a need to overcome the above limitations in the existing art which is satisfied by the inventive structure and method described hereinafter.
The present invention overcomes the identified problems by providing a storage volume reservation system, method, and computer program for maintaining cache coherency amongst a plurality of caching controllers in a data storage system during a warm start cache recovery. More specifically, the invention provides a method of maintaining cache coherency amongst a plurality of caching storage controllers in a data storage system during a warm start utilizing a stripe lock data structure. The stripe lock data structure is defined to maintain reservation status of cache lines within data extents that are part of a logical unit or storage volume. A battery backup unit (BBU) stores stripe lock data structure and dirty cache line data of each of the plurality of controllers during a power failure. Using the stripe lock data structure information, a delay required for continued processing of I/O requests from one or more host computers following the warm start cache recovery is minimized. Without saving the stripe lock data structure, continued processing of I/O requests from one or more host computers requires reestablishing stripe locks, during the warm start cache recovery, for cache line data saved before the power failure.
The inventive structure and method provide a storage volume reservation system in a computing environment that allows data access through two or more caching controllers. The stripe lock data structure is defined in memory within each of the two or more caching controllers. The stripe lock data structure is used to provide consistent information within each of the two or more caching controllers. A battery backup unit is configured to save the stripe lock data structure and cache line data of each of the two or more caching controllers during a power failure. The cache line data structure minimizes a delay required for continued processing of I/O requests from one or more host computers following the warm start cache recovery.
A computer program and computer program product for maintaining cache coherency amongst a plurality of caching storage controllers in a data storage system during a warm start is also provided which includes a computer readable medium and a computer mechanism stored thereon for implementing the inventive method and procedures thereof.
Advantages of the inventive method and structure eliminate the delay required to re-establish stripe locks in order to flush dirty cache line data to a storage volume during a warm start cache recovery.