The present invention is related to storing data in a multiple controller configuration, and in particular, to the mirroring of data using direct memory access engines.
Network storage controllers are typically used to connect a host computer system with peripheral storage devices, such as disk drives or tape drives. The network storage controller acts as an interface between the host computer and the peripheral storage devices. In many applications, the network storage controller performs processing functions on the data transferred between the host computer and peripheral devices. One common application of such a system is a redundant array of inexpensive disks (RAID). A RAID system stores data on multiple disk drives to protect the data against disk drive failure. If one disk drive fails, then the RAID system is generally able to reconstruct the data which was stored on the failed drive from the remaining drives in the array. A RAID system uses a network storage controller, which in many cases includes a RAID controller, as an interface between the host computer and the array of disk drives.
Many applications require a storage system to have very high availability. This high availability is a key concern in many applications, such as financial institutions and airline reservations systems, because the users rely heavily on the data stored on the RAID system. In these type of applications, unavailability of data stored on the RAID system can result in significant loss of revenue and/or customer satisfaction. Employing a RAID system in such an application enhances availability of the stored data, since if a single disk drive fails, data may still be stored and retrieved from the system. In addition to the use of a RAID system, it is common to use redundant RAID controllers to further enhance the availability of a storage system. In such a situation, two or more controllers are used in a RAID system, with each controller having failover capability, where if one of the controllers fails the other remaining controller will assume operations for the failed controller. Such a platform enhances the availability of a RAID system, however, it can lead to several disadvantages, as will be discussed below.
FIG. 1 shows a block diagram representation of a dual controller configured RAID network storage controller 10, showing a fibre channel to fibre channel connection. That is, in this example, the host computer and the array of disk drives both communicate with the network storage controller using fibre channel connections. While fibre channel is a common channel medium is such systems, it should be understood that other channels may also be used, such as, for example, Small Computer System Interface (SCSI) or Ethernet. The RAID system shown in FIG. 1 includes two host ports, host port-114 and host port-218 and two disk ports, disk port-122 and disk port-226. Each host port 14, 18 may be zoned to different host computers, and each disk port 22, 26 may be zoned to different disk arrays, as is common in RAID systems and is well known in the art. The network storage controller 10 includes dual RAID controllers, controller-A 30, and controller-B 34. In a system employing zoning of controllers, controller-A 30 may be zoned to host port-114 and disk port-122, and controller-B 34 may be zoned to host port-218 and disk port-226.
As is understood in the art, systems which employ dual controllers require data mirroring between controllers to maintain cache coherency. Each controller 30, 34, must have a copy of the data and status of the other controller in order to maintain redundancy between the controllers and thus maintain operation of the RAID system if one controller fails. Mirroring data between controllers can decrease the performance of a RAID system because transferring data between controllers uses processing resources of the controllers, as well as channel bandwidth, as will be discussed in more detail below.
The controllers 30, 34 are connected to a fibre channel bus 38, which is connected to two IO modules, IO module-142, and IO module-246. Each controller 30, 34, includes a CPU subsystem 50, a double data rate (DDR) memory 54, control logic 58, a dual port fibre channel connection with two host ports 62a, 62b and a dual port fibre channel connection with two disk ports 66a, 66b. The CPU subsystem 58 performs tasks required for storage of data onto an array of disks, including striping data, and initiating and executing read and write commands. The DDR memory 54 is a nonvolatile storage area for data and other information. The control logic 58 performs several functions, such as interfacing with the CPU subsystem 50, DDR memory 54, and the host ports 62a, 62b and the disk ports 66a, 66b. The control logic 58 may also have other functions, including a parity generation function, such as an exclusive OR (XOR) engine. The host ports 62a, 62b and disk ports 66a, 66b provide communications with the fibre channel backplane 38. The IO modules 42, 46 include link resiliency circuits (LRCs) 70, also known as port bypass circuits, which function to connect each host port 14, 18 and each disk port 22, 26 to each controller 30, 34. This allows both controllers 30, 34 to have access to both host ports 14, 18 and both disk ports 22, 26.
In order to provide full redundancy, each controller must have a connection to each host port 14, 18 and each disk port 22, 26. This way, if there is a failure of one of the controllers, the other controller can continue operations. However, when using zoning techniques to enhance the performance of a RAID system, half of these ports are passive. For example, if controller-A 30 is zoned to host port-114 and disk port-122, then controller-A 30 receives all communications from host port-114 and controls the disk array(s) on disk port-122. Likewise, controller-B 34 would be zoned to host port-218 and disk port-226. These zoning techniques are well known in the art and can increase performance of the RAID system as well as simplify control and communications of the two controllers 30, 34. In the example of FIG. 1, on controller-A 30 the host port connection 62a and disk port connection 66a are connected to host port-114 and disk port-122, respectively, through the LRCs 70 of IO module-142. Because controller-A 30 is zoned to host port-114 and disk port-122, the host port connection 62a and disk port connection 66a actively communicate with host port-114 and disk port-122. The remaining host port connection 62b and disk port connection 66b are connected to host port-118 and disk port-226, respectively, through the LRCs 70 of IO module-246. These connections are typically passive connections, as controller-A 30 is not actively communicating with host port-218 and disk port-226, so long as controller-B 34 does not fail. Likewise, controller-B 34 would be zoned to host port-218 and disk port-226. Thus, on controller-B 34, the host port connection 62b and disk port connection 66b would communicate with host port-218 and disk port-226 through LRCs 70 of IO module-246. The remaining host port connection 62a and disk port connection 66a would be connected to host port-114 and disk port-122 through LRCs 70 of IO module-142.
As mentioned above, in typical redundant controller operations data is mirrored between controllers. When mirroring data between controller-A 30 and controller-B 34, it is common to transfer the mirrored data over the shared disk port connections, namely disk port connection 66b of controller-A 30, and disk port connection 66a of controller-B. For example, controller-B 34 may receive data over host port-218 that is to be written to an array of drives over disk port-2. Controller-B 34 would receive this data and store it in memory 54. In order to maintain cache coherency, controller-B 34 must also communicate this data to controller-A 30, thus both controllers have the data, and if one fails the other is still be able to write the data.
In a traditional system, this mirroring is accomplished over several steps. FIG. 12 is a flow chart representation of the steps required to mirror data between two controllers in an active/active controller pair. Initially, controller-B 34 receives data to be written to the disk array, as indicated by block 80. To mirror the data, controller-B 34 issues a first mirror command causing a first interrupt to controller-A 30, notifying controller-A 30 that a message is being sent, as noted by block 82. An interrupt is a signal generated automatically by hardware on a controller when a message is received, in this example controller-B, to a processor, in this example hardware on controller-A, which causes the processor to stop what it is doing and service the interrupt. When controller-A receives the first interrupt, it discontinues any processing activity, and processes the first mirror command. Controller-B 34 next issues a second mirror command containing metadata which causes a second interrupt, as indicated by block 84. The metadata contains the actual message body, and information showing controller-A 30 the memory location at which to store the user data. Next, controller-A 30 marks its nonvolatile memory (NVRAM) contents as invalid for the data blocks specified in the metadata, as indicated by block 86. Next, controller-B 34 issues a third mirror command containing user data, which causes a third interrupt, according to block 88. Controller-A receives the user data, stores the user data in the specified location in its NVRAM, and marks the NVRAM contents as valid for the specified data blocks, as noted by block 90. Once controller-B 34 has completed the associated write operation, it then issues a fourth mirror command causing a fourth interrupt and a notification the write is complete, as noted by block 92. Controller-A then marks the write complete, as indicated by block 94.
As can be seen, while this mirroring technique is successful in copying data between controllers, it can use significant processing resources. Each write operation requires four interrupts, which cause the receiving processor to suspend any tasks it is currently processing and service the interrupt. Thus, it would be advantageous to have a network storage controller which consumes less processing resources for mirroring data.
Additionally, this mirroring is typically accomplished using the disk channels. In each of the mirror commands described above, controller-B 34 sends the data over the disk port connection 66a which connects to the LRC 70 connected to disk port-122. The data transfers through the LRC 70, where it is then received at the disk port connection 66a on controller-A. Controller-A then receives the data and performs appropriate processing and storage steps. Likewise, if controller-A 30 receives data to be written to the array of disks on disk port-122, it sends the data to controller-B 34 using the same mirroring technique. Note this technique does not require dedicated disk ports and more than one disk port can be used
While this uses the remaining disk port on each controller, the second host port on each controller remains unused, thus passive, during normal operation of the system. The passive ports on each controller adds a significant amount of hardware to the controller, and can add significant cost to the network storage controller 10. Thus, it would be advantageous to provide a redundant network storage controller which maintains high availability while reducing cost and hardware associated with passive ports located on the controllers.
Additionally, mirroring data in such a system results in the mirrored data and storage data being sent over the same port for the controller that is receiving the mirrored data or being used to transfer data to the disk. Bandwidth to and from the disk array is consumed by the mirrored data, which can reduce the performance of the network storage controller. Thus, it would be advantageous to have a network storage controller which consumes little or no disk channel bandwidth when mirroring data between controllers.
Furthermore, with the continual increasing of demand for data storage, RAID controllers often require upgrades with additional disk drives or faster bus interfaces. However, a RAID controller may not be configured to add additional bus interface capacity or may not support a new type of bus interface. Such controllers commonly have to be replaced when an upgrade is performed. This replacement of controllers can increase the cost of upgrading a RAID system. The replacement of an operational RAID controller represents a loss in value that may inhibit the decision to upgrade a RAID system. Thus, it would be advantageous to have a system which can support upgrades of capacity, as well as new interface types, with ease and reduced cost.
Accordingly, there is a need to develop an apparatus and method for use in a network storage controller which: (1) provides redundancy with reduced cost for passive components, (2) reduces the amount of mirrored data which is sent over the disk or host ports, (3) reduces the processing overhead involved with mirroring data, and (4) provides easily replaceable and upgradeable components.
In accordance with the present invention, a method and apparatus are provided for mirroring data in a storage system including a storage array. The apparatus includes a first controller management module including a first processor and a first direct memory access engine. The first processor is used in controlling read operations and write operations involving the storage array. The first direct memory access engine is used in storing data received by the first controller memory module. The apparatus also includes a second controller management module including a second processor and a second direct memory access engine. The second processor is used in controlling read operations and write operations involving the storage array. The second direct management access engine can be used in transferring data from the second controller management module to the first controller memory module. Data is mirrored from the first controller management module to the second controller management module using the first direct memory access engine while avoiding interruption of the second processor. The first direct memory access engine is separate from but in communication with the first processor and the first processor controls mirroring of data using the first direct memory access engine. In one embodiment, the first controller management module includes a field programmable gate array. The first direct memory access engine is in communication with at least portions of the field programmable gate array, and the first direct memory access engine can be a part of the field programmable gate array.
In one embodiment, the apparatus includes a first channel interface module having a first shared path. The first channel interface module communicates with the first controller memory module and the first shared path is used in transferring data between the first controller management module and the second controller management module. A passive backplane interconnects the first channel interface module and the first controller management module. The second processor controls operations associated with the second controller management module while the data is being mirrored to the second controller memory module. The data is mirrored to the second controller management module independently of the second direct memory access engine. Within the second controller management module, there is non-volatile memory, and data can be stored in the non-volatile memory independently of the second processor. The first direct memory access engine marks portions of the non-volatile memory where the data is to be stored as invalid, and transfers the data to the non-volatile memory. The portions of the non-volatile memory where the data is stored are then marked as valid.
The method includes mirroring data from the first controller management module to the second controller management module using the first direct memory access engine. The first processor within the first controller management module determines that data mirroring is to be conducted. The second processor within the second controller management module controls read and write operations involving the storage array, and the data mirroring is conducted while avoiding interruption of the second processor. Hence, the second processor can continue performing its own operations during the time that the data is being mirrored. The data mirroring is conducted using the first direct memory access engine without requiring the second direct memory access engine. During the mirroring, data is stored in non-volatile memory in the second controller management module. When conducting the mirroring, the first direct memory access engine is used to mark contents of the non-volatile memory that is to receive data as invalid and transfer the data to the non-volatile memory. The first direct memory access engine then marks the contents of the non-volatile memory that received the data as valid. In one embodiment, the first direct memory access engine is also used in determining parity for information stored on the storage array using the first controller management module.