The present invention relates generally to an adaptive write policy for handling host write commands to write-back system drives in a dual active controller environment. More particularly, the present invention relates structure and method for determining if it is more efficient to flush host write data directly to a system drive in accordance with a write-through policy or to mirror the host write data and then to flush the data to the storage medium in accordance with a write-back policy.
FIG. 1 is a block diagram that illustrates aspects of a typical active-active controller environment 100. In such active-active controller environments 100, primary controller 102 mirrors host write data 104 to alternate controller 108 provide fault tolerance to system drive 110 implementing a write-hack write policy. Host write data 104 is typically mirrored to alternate controller 108 very host write command 106 received from host system 120. This procedure mirrors host write data 104 and responds to the host system 120 before flushing the host write data 104 to the system drive 110 to provide data fault tolerance to system 100. Host write data 104 is mirrored and protected in the event of a controller 102 or 104 failure.
Referring to the reference numbers in FIG. 1, we illustrate a typical prior art procedure for performing a write-back policy in system 100. First, primary controller 102 receives and accepts host write command 106. Next, primary controller 102 transfers host write data 104 into cache lines (not shown) stored in primary controller""s 102 memory (not shown), The primary controller 102 then mirrors host write data 104 to alternate controller 108 (see 3a). Alternate controller 108 verifies to primary controller 102, that the mirrored data was received (see 3b), and once verified, sends write status 124 to the host system 122; and (5) flush dirty host write data 104 to the system drive 110. Finally, dirty host write data 104 is flushed to the system drive 110.
This method is efficient and an effective means of providing fault tolerance, for some but not all cases. For example, host write commands 106 that encompass an entire RAID stripe may be handled differently. Frequently, host writes that fill a full RAID stripe may with reasonable probability fill multiple stripes. Rather than use memory and bus bandwidth transferring the write data to another controller to make a mirror copy a decision is made based on efficiency grounds, to transfer the full RAID stripe to the drive as large transfer to each drive. In such circumstances fault tolerance may be sacrificed using such traditional approach.
FIG. 2 illustrates a situation in which a host write command 106 dirties all data segments of an entire RAID stripe for a system drive configured as a 4+1 RAID 5 system drive 110 utilizing an 8096 (8K) byte stripe size. A 4+1 RAID 5 system drive is a system drive with four data drives (112, 114, 116 and 118) and one parity drive (120). The host write command 106, in this example, transfers 64 blocks of host write data 104 into four cache lines 132 (e.g. 132-1, 132-2, 132-3 and 132-4) defined in primary controller memory 130 of primary controller 102. Four cache lines 132 are used in order to correspond to each of the four data drives 112, 114, 116 and 118. Since a block is equal to approximately one sector (512 bytes), each data drive (112, 114, 116, 118) supports a data stripe size of 16 blocks (8096/521). In addition, since the system drive 110 is using five drives (112, 114, 116, 118 and 120), in a 4+1 RAID 5 configuration, transferring 64 blocks to sector 0 results in 16 (64/4) blocks written to each data drive (112, 114, 116, 118) and 16 blocks to parity drive 120. Therefore the host write command will fill each data drive data stripe, thereby dirtying all blocks for the entire RAID 5 stripe.
One problem with this traditional approach is that whole RAID stripes become dirty as a result of a large sequential write from the host. In this case each write is mirrored, but the cache is filling quickly and the space occupied by the data that was just mirrored is needed for new host write data 104 that is arriving from the host. Older data is flushed to the system drive disk 110.
Therefore, there remains a need to overcome the above limitations in the existing art which is satisfied by the inventive structure and method described hereinafter.
The present invention overcomes the identified problems by providing an adaptive write policy for handling host write commands to write-hack system drives in a dual active controller environment. The present invention provides an inventive method and apparatus for determining if it is more efficient to flush the host write data directly to a system drive in accordance with a write-through policy versus mirroring the host write data and then flushing the data to the system drive in accordance with a write-back policy.
In accordance with one embodiment of the invention, a method for an adaptive write policy in a data storage system is described, where the data storage system includes a host system connected to a primary controller and an alternate controller. The first and alternate controllers are also connected to a system drive that includes one or more disk storage devices, such as a plurality of hard disk drives or other storage devices configured as a disk or storage array. A Redundant Array of Independent Disc (RAID) based storage system or RAID array is one example of such a storage array. The primary controller is connected to a first memory, and the alternate controller is connected to a second memory. The first and alternate controllers manage the data storage system in dual active configuration.
In accordance with this method, the primary controller receives a host write command from the host system. The write data request includes host write data to be written by the primary controller to the system drive. When the system drive is configured with a write-back policy, the primary controller determines whether the host write command encompasses an entire RAID stripe. If the host write command encompasses an entire RAID stripe, the primary controller processes the host write command in accordance with a write-through policy. Otherwise, the primary controller processes the host write command in accordance with a write-back policy.
In a further embodiment an exemplary controller is described for connecting to a system drive that includes one or more disk storage devices, and for connecting to a host system. The controller is connected to a memory that has a cache line descriptor data structure defined therein. The cache line descriptor data structure is used by the controller to determine whether the host write command encompasses an entire RAID stripe.
In this embodiment, the cache line descriptor data structure includes information about a number of memory tracks allocated for each host write command; a physical disk of a RAID stripe wherein each first memory track is assigned; an offset number of each first memory track; and a block count for each memory track.
In a further embodiment a data storage system providing an adaptive write policy is described. The data storage system includes a first and a second controller operatively coupled to a host system and a system drive. The system drive including one or more disk storage devices. The first and second controller each having an identical respective memory layout, with the primary controller being operatively coupled to a first memory and the second controller being operatively coupled to a second memory. The first and second memory each have a respective data structure defined therein, where the data structure includes a cache line descriptor data structure.
In this embodiment, the cache line descriptor data structure includes information about a number of memory tracks allocated for each host write command; a physical disk of a RAID stripe wherein each first memory track is assigned; an offset number of each first memory track; and a block count for each memory track.
The invention provides many advantages over known techniques. Advantages of the invention include a new approach which adds a decision making step to the traditional approach of handling host write commands to write-back system drives in a dual active controller environment. This inventive decision provides for optimized BUS utilization by reducing the amount of host write data that has to be mirrored to the alternate controller and forcing the host write data directly to system drive for all full RAID stripe writes. The new step determines if it is more efficient to flush the data directly to the storage medium versus mirroring the user data and then flushing the data to the system drive.