1. Field of the Invention
The present invention relates generally to fault-tolerant data management, and more particularly to systems and methods for managing snapshots of metadata in a dual controller environment.
2. Description of the Related Art
There has been an increasing demand for access to high performance and fault-tolerant data storage to keep pace with advances in computing infrastructures. While the cost of storage devices such as hard disk drives (HDDs) has been plummeting due to manufacturing improvements, the cost of managing data storage has risen steadily. Storage management has become critical to many enterprises that rely on online access to operational and historic data in their day-to-day business operations.
However, HDDs are prone to failure of their electromechanical components. Hence, storage systems that include many HDDs need to have redundancy built into them, to avoid data loss when an HDD fails. One popular technique for avoiding the loss of data from HDD failure is known as Redundant Array of Independent Disks (RAID), which is a class of algorithms that store data redundantly on an array of HDDs.
Since RAID algorithms add redundancy to user data and decide data layout on the HDDs, they are executed on a dedicated hardware storage controller in order to free the host processor-memory complex from the task of executing these algorithms. These hardware components typically include a dedicated processor and memory, as well as Application Specific Integrated Circuits (ASICs), which perform Exclusive OR (XOR) parity calculations, protocol processing, etc. In RAID systems a host machine communicates with this hardware either through the system bus (in which case the storage controller is called a RAID adapter) or via a storage interconnect like Small Computer System Interface (SCSI) (in which case the hardware is called a RAID controller). HDDs connected to the controller are mapped to logical drives that are created via configuration commands sent to the controller by an application. A logical drive is a storage extent that is externalized by the controller to its host and resembles and extent on a HDD. The RAID controller, depending on the RAID level chosen for a logical drive, decides the location and the need to update redundant data.
Unfortunately, RAID adapters and RAID controllers, like HDDs, are also subject to failure. One way to address the possible failure of a data storage controller is to provide a second controller. The operation of a storage system using two controllers is referred to as xe2x80x9cdual-modexe2x80x9d, while operation with only a single controller is called xe2x80x9csimplex-modexe2x80x9d. In dual-mode, each storage controller contains metadata that defines the current mapping of the logical drives onto a physical disk. However, if one controller fails there is the possibility of a loss of this metadata. This is a particular concern in systems that use fault tolerant metadata snapshots.
Snapshots are a high level feature of storage subsystems that allow a logical copy of a source drive to be copied onto a target drive instantaneously. A snapshot of data at a time xe2x80x9ctxe2x80x9d creates, in a target data volume, a logical copy of data in a source data volume. Physical copying of the data from the source volume to the target volume can then subsequently take place. Any intervening changes (xe2x80x9cwritesxe2x80x9d) to data in the source volume are momentarily delayed while the original version of the data sought to be changed is preferentially copied from the source volume to the target volume. Thus, the snapshot of data in the target volume represents the exact state of the data in the source volume at the time xe2x80x9ctxe2x80x9d.
Snapshots, as defined above, are useful for backing up data and for testing. For example, taking a snapshot of frequently changing data facilitates the execution of test applications against the snapshot of the data, without the test application execution being unduly interfered with by changes to the data. Moreover, the snapshot mechanism facilitates faster data backups by a storage subsystem as compared to file system-based backups, which entail host CPU processing and which require the allocation of relatively high network bandwidth.
In general, the snapshot mechanism""s data structures must keep track of what has been copied over from the source drive to the target drive, so that the mechanism returns the data from the correct location. If data is to be written to the source drive, a copy must first be initiated from the source to the target before allowing the write to proceed. To keep track of which data has been copied over from the source to the target, the source drives are generally divided into segments. The size of each segment is generally fixed throughout the storage subsystem and is determined by latency issues. In particular, if the segment size is too large, then reading a byte of data will incur the latency of the transfer of a full segment""s worth of data. In some systems, the segments are organized into segment ranges to facilitate the tracking of the status of snapshots of segments. Each segment in a range has the same property: within a given range, either all of the segments, or none of the segments, have been copied over.
In dual-mode, each storage controller has control of different portion of the configuration space that maps the logical drives onto the physical disks. Hence, the snapshot metadata in each controller will change independently of each other. As a result, if one controller fails, the other controller will not have the current state of the snapshot metadata that resides in the failed controller. Thus there is a need for a way to manage snapshot metadata in a dual-mode data storage system to prevent the loss of snapshot metadata in the event that one of the controllers fails. Also, there is a need for a way to accomplish this kind of management of a dual-mode data storage system without significantly adding to the total number of messages required.
The present invention has carefully considered the above problems and has provided the solution set forth herein.
A computer-implemented method is disclosed for managing data snapshots among source and target storage volumes in a dual-controller environment. The method includes establishing a configuration space that maps the source and target storage volumes to first and second logical drives respectively. The configuration space is then divided into first and second portions, wherein a first storage controller controls the first portion and a second storage controller controls the second portion. A snapshot relationship is established between the source and target storage volumes such that portions of data on the source storage volume are logically mirrored on the target storage volume. The identical snapshot relationship is stored in both the first and second storage controllers.
In accordance with one aspect of the invention, in response to a request to perform an operation that results in a modification of the snapshot relationship, the operation is first performed, and then a modified snapshot relationship is transferred to both the first and second controllers. The modified snapshot relationship reflects the operation, thereby synchronizing the snapshot relationship data in the first and second controllers.
The details of the present invention, both as to its structure and operation, can best be understood in reference to the accompanying drawings, in which like reference numerals refer to like parts, and in which: