The need to store digital files, documents, pictures, images and other data continues to increase rapidly. In connection with the electronic storage of data, systems incorporating more than one storage device have been devised. In general, using a number of storage devices in a coordinated fashion in order to store data can increase the total storage volume of the system. In addition, data can be distributed across the multiple storage devices such that data will not be irretrievably lost if one of the storage devices (or in some case more than one storage device) fails. An additional advantage that can be achieved by coordinating operation of a number of individual storage devices is improved data access and/or storage times. Examples of systems that can provide such advantages can be found in the various RAID (redundant array of independent disks) levels that have been developed.
High availability is a key concern because in many applications users rely heavily on the data stored on the RAID system. In these types of applications, unavailability of data stored on the RAID system can result in significant loss of revenue and/or customer satisfaction. Employing a RAID system in such an application enhances availability of the stored data, since if a single disk drive fails, data may still be stored and retrieved from the system. In addition to the use of a RAID system, it is common to use redundant RAID controllers to further enhance the availability of such a storage system. In such a situation, two or more controllers are used such that, if one of the controllers fails, the remaining controller will assume operations for the failed controller. The availability of the storage system is therefore enhanced, because the system can sustain a failure of a controller and continue to operate. When using dual controllers, each controller may conduct independent read and write operations simultaneously. This is known as an active-active configuration. In an active-active configuration, write-back data and associated parity data are mirrored between the controllers.
In a system using two controllers, data sent from the host to be written to the disk array is typically sent to either the first active controller or the second active controller. Where the data is sent depends upon the location in the disk array to which the data will be written. In active-active systems, typically one controller is zoned to a specific array of drives or a specific area, such as a partition or logical unit number (LUN). Thus, if data is to be written to the array or array partition that the first active controller is zoned to, the data is sent to the first active controller. Likewise, if the data is to be written to an array or array partition that the second active controller is zoned to, the data is sent to the second active controller. In order to maintain redundancy between the two controllers, the data sent to the first active controller must be copied on to the second active controller. Likewise, the data sent to the second active controller must be copied onto the first active controller.
When a controller in an active-active controller pair suffers a failure, the other active controller recognizes the failure and takes control of the write operations of the first controller. This may include the surviving controller determining whether the failed controller had data writes outstanding. If data writes are outstanding, the surviving controller issues a command to write the new data and parity to the target array or array partition. Furthermore, following the failure of a controller, the surviving controller can perform new write operations that would normally have been handled by the failed controller.
In order to provide high input/output (IO) performance, a typical RAID controller has a large cache, often in the range of gigabytes, such as 1, 2, 4, 8 or 20 gigabytes. Typically, half of the cache is dedicated as read cache, and the other half is dedicated as write cache. For a redundant controller implantation where dual controllers exist in a storage system, half of the write cache is used as write back cache for LUNs/arrays owned by the local controller, and the other half is used to mirror the cache data from the partner controller.
Because of the advantages of providing redundant controllers, data storage systems often provide two slots for receiving controllers, such as RAID controllers. If a user desires, the data storage system can be operated using only a single controller, for example if cost constraints prevent or dissuade the user from provisioning the data storage system with two controllers. The data storage system may then be upgraded to dual redundant controller operation by adding a second controller at a later time. However, upgrading a system originally deployed as a single controller system to operate using two controllers, thereby providing redundant controller operation, typically requires that the storage system be taken off line. In particular, in a single controller system, all of the data storage devices are created and owned by (or zoned to) that controller. When a new controller is inserted into the system, all of the storage devices remain owned by the original controller. Accordingly, only newly created or added devices may be owned by (or zoned to) the second controller.
In particular, in a standalone configuration, a single controller is running in the data storage system. In such a controller, the cache is segmented, half for read and half for write operations. In a standalone controller system, there is no need or place to mirror the data. Therefore, the entire write cache of the single controller is allocated as write back cache for the LUNs/arrays owned by the controller. If a second controller is added to a single controller system, the original controller is unable to mirror the data from the newly inserted controller, because all of the write cache of the original controller has been allocated to write operations involving the LUNs/arrays owned by that controller. As a result, in order to enable active-active redundant controller operation, the original controller must go through a shut down and reboot operation to flush the cache data in order to re-segment the cache into a write-back region that includes a primary write-back region and a mirror write-back region. This process disrupts the normal system operation and may not be acceptable in certain applications, such as video streaming.