The present invention relates to mirror split operations conducted by data-storage device controllers and, in particular, to a method and system for decreasing the time needed for execution of a mirror split operation by preparing for the mirror split operation, following notification of an impending mirror split operation, and prior to reception of a request for the corresponding mirror split operation.
The present invention relates to mirror split operations conducted by the controllers of various types of data storage devices. The described embodiment relates to disk-array data-storage devices and disk-array controllers. Therefore, a concise background of disk and disk-array technologies is provided below.
FIG. 1 is a block diagram of a standard disk drive. The disk drive 101 receives I/O requests from remote computers via a communications medium 102 such as a computer bus, fibre channel, or other such electronic communications medium. For many types of storage devices, including the disk drive 101 illustrated in FIG. 1, the vast majority of I/O requests are either READ or WRITE requests. A READ request requests that the storage device return to the requesting remote computer some requested amount of electronic data stored within the storage device. A WRITE request requests that the storage device store electronic data furnished by the remote computer within the storage device. Thus, as a result of a READ request carried out by the storage device, data is returned via communications medium 102 to a remote computer, and as a result of a write request, data is received from a remote computer by the storage device via communications medium 102 and stored within the storage device.
The disk drive storage device illustrated in FIG. 1 includes controller hardware and logic 103 including electronic memory, one or more processors or processing circuits, and controller firmware, and also includes a number of disk platters 104 coated with a magnetic medium for storing electronic data. The disk drive contains many other components not shown in FIG. 1, including read/write heads, a high-speed electronic motor, a drive shaft, and other electronic, mechanical, and electromechanical components. The memory within the disk drive includes a request/reply buffer 105, which stores I/O requests received from remote computers, and an I/O queue 106 that stores internal I/O commands corresponding to the I/O requests stored within the request/reply buffer 105. Communication between remote computers and the disk drive, translation of I/O requests into internal I/O commands, and management of the I/O queue, among other things, are carried out by the disk drive I/O controller as specified by disk drive I/O controller firmware 107. Translation of internal I/O commands into electromechanical disk operations, in which data is stored onto, or retrieved from, the disk platters 104, is carried out by the disk drive I/O controller as specified by disk media read/write management firmware 108. Thus, the disk drive I/O control firmware 107 and the disk media read/write management firmware 108, along with the processors and memory that enable execution of the firmware, compose the disk drive controller.
Individual disk drives, such as the disk drive illustrated in FIG. 1, are normally connected to, and used by, a single remote computer, although it has been common to provide dual-ported disk drives for use by two remote computers and multi-port disk drives that can be accessed by numerous remote computers via a communications medium such as a fibre channel. However, the amount of electronic data that can be stored in a single disk drive is limited. In order to provide much larger-capacity electronic data-storage devices that can be efficiently accessed by numerous remote computers, disk manufacturers commonly combine many different individual disk drives, such as the disk drive illustrated in FIG. 1, into a disk array device, increasing both the storage capacity as well as increasing the capacity for parallel I/O request servicing by concurrent operation of the multiple disk drives contained within the disk array.
FIG. 2 is a simple block diagram of a disk array. The disk array 202 includes a number of disk drive devices 203, 204, and 205. In FIG. 2, for simplicity of illustration, only three individual disk drives are shown within the disk array, but disk arrays may contain many tens or hundreds of individual disk drives. A disk array contains a disk array controller 206 and cache memory 207. Generally, data retrieved from disk drives in response to READ requests may be stored within the cache memory 207 so that subsequent requests for the same data can be more quickly satisfied by reading the data from the quickly accessible cache memory rather than from the much slower electromechanical disk drives. Various elaborate mechanisms are employed to maintain, within the cache memory 207, data that has the greatest chance of being subsequently re-requested within a reasonable amount of time. The data contained in WRITE requests may also be stored first in cache memory 207, in the event that the data may be subsequently requested via READ requests or in order to defer slower writing of the data to physical storage medium.
Electronic data is stored within a disk array at specific addressable locations. Because a disk array may contain many different individual disk drives, the address space represented by a disk array is immense, generally many thousands of gigabytes to tens or hundreds of terabytes. The overall address space is normally partitioned among a number of abstract data storage resources called logical units (xe2x80x9cLUNsxe2x80x9d). A LUN includes a defined amount of electronic data storage space, mapped to the data storage space of one or more disk drives within the disk array, and may be associated with various logical parameters including access privileges, backup frequencies, and mirror coordination with one or more LUNs. LUNs may also be based on random access memory (xe2x80x9cRAMxe2x80x9d), mass storage devices other than hard disks, or combinations of memory, hard disks, and/or other types of mass storage devices. Remote computers generally access data within a disk array through one of the many abstract LUNs 208-215 provided by the disk array via internal disk drives 203-205 and the disk array controller 206. Thus, a remote computer may specify a particular unit quantity of data, such as a byte, word, or block, using a bus communications media address corresponding to a disk array, a LUN specifier, normally a 64-bit integer, and a 32-bit, 64-bit, or 128-bit data address that specifies a LUN, and a data address within the logical data address partition allocated to the LUN. The disk array controller translates such a data specification into an indication of a particular disk drive within the disk array and a logical data address within the disk drive. A disk drive controller within the disk drive finally translates the logical address to a physical medium address. Normally, electronic data is read and written as one or more blocks of contiguous 32-bit or 64-bit computer words, the exact details of the granularity of access depending on the hardware and firmware capabilities within the disk array and individual disk drives as well as the operating system of the remote computers generating I/O requests and characteristics of the communication medium interconnecting the disk array with the remote computers.
In many computer applications and systems that need to reliably store and retrieve data from a mass storage device, such as a disk array, a primary data object, such as a file or database, is normally backed up to backup copies of the primary data object on physically discrete mass storage devices or media so that if, during operation of the application or system, the primary data object becomes corrupted, inaccessible, or is overwritten or deleted, the primary data object can be restored by copying a backup copy of the primary data object from the mass storage device. Many different techniques and methodologies for maintaining backup copies have been developed. In one well-known technique, a primary data object is mirrored. FIG. 3 illustrates object-level mirroring. In FIG. 3, a primary data object xe2x80x9cO3xe2x80x9d 301 is stored on LUN A 302. The mirror object, or backup copy, xe2x80x9cO3xe2x80x9d 303 is stored on LUN B 304. The arrows in FIG. 3, such as arrow 305, indicate I/O write requests directed to various objects stored on a LUN. I/O WRITE requests directed to object xe2x80x9cO3xe2x80x9d are represented by arrow 306. When object-level mirroring is enabled, the disk array controller providing LUNs A and B automatically generates a second I/O write request from each I/O write request 306 directed to LUN A, and directs the second generated I/O write request via path 307, switch xe2x80x9cSixe2x80x9d 308, and path 309 to the mirror object xe2x80x9cO3xe2x80x9d 303 stored on LUN B 304. In FIG. 3, enablement of mirroring is logically represented by switch xe2x80x9cS1xe2x80x9d 308 being on. Thus, when object-level mirroring is enabled, any I/O write request, or any other type of I/O request that changes the representation of object xe2x80x9cO3xe2x80x9d 301 on LUN A, is automatically mirrored by the disk array controller to identically change the mirror object xe2x80x9cO3xe2x80x9d 303. Mirroring can be disabled, represented in FIG. 3 by switch xe2x80x9cS1xe2x80x9d 308 being in an off position. In that case, changes to the primary data object xe2x80x9cO3xe2x80x9d 301 are no longer automatically reflected in the mirror object xe2x80x9cO3xe2x80x9d 303. Thus, at the point that mirroring is disabled, the stored representation, or state, of the primary data object xe2x80x9cO3xe2x80x9d 301 may diverge from the stored representation, or state, of the mirror object xe2x80x9cO3xe2x80x9d 303. Once the primary and mirror copies of an object have diverged, the two copies can be brought back to identical representations, or states, by a resync operation represented in FIG. 3 by switch xe2x80x9cS2xe2x80x9d 310 being in an on position. In the normal mirroring operation, switch xe2x80x9cS2xe2x80x9d 310 is in the off position. During the resync operation, any I/O operations that occurred after mirroring was disabled are logically issued by the disk array controller to the mirror copy of the object via path 311, switch xe2x80x9cS2,xe2x80x9d and pass 309. During resync, switch xe2x80x9cS1xe2x80x9d is in the off position. Once the resync operation is complete, logical switch xe2x80x9cS2xe2x80x9d is disabled and logical switch xe2x80x9cS1xe2x80x9d 308 can be turned on in order to reenable mirroring so that subsequent I/O write requests or other I/O operations that change the storage state of primary data object xe2x80x9cO3,xe2x80x9d are automatically reflected to the mirror object xe2x80x9cO3xe2x80x9d 303.
In the described embodiment, mirroring is conducted by a disk array controller on a per LUN basis. In the described embodiment, a LUN may be mirrored on a remote LUN for various reasons, including for preparation of a backup copy of a LUN for database backup, checkpoint, or archival purposes. For these purposes, a LUN may be mirrored for some interval of time, and the mirroring may then be disabled, or, in other words, the mirror LUN may then be split, so that the remote LUN of the local-LUN/remote-LUN mirror pair can be used as a consistent snapshot of the data state of the local LUN at a later point in time. Thus, the mirroring capability built into disk array controllers can be exploited for efficient, hardware-level data backup by database management systems and other application programs.
Unfortunately, the mirror split operation may take a fairly long period of time to complete, during which time the remote LUN is not available for backup and archival purposes. In many cases, backup utilities and other application programs or system routines that employ hardware-based mirroring for generating backup copies require that neither the remote LUN nor the local LUN be actively responding to host-computer I/O requests during the mirror split operation. Thus, although an instantaneous mirror split would be most desirable from the standpoint of an application program, such as a database management system, in order to quickly produce backup copies without interrupting service to users, hardware-based backup copy generation by mirror split operations can result in extensive downtime for the application, in the case of large gigabyte-sized LUNs, often ranging from many tens of minutes to hours.
FIGS. 4A-D illustrate problems that prevent quick mirror split operations. In FIGS. 4A-D, a simplified, abstract view of mirror-related I/O request handling is provided. The view is simplified because only a single mirrored LUN pair is discussed, although a disk array controller needs to concurrently handle processing of I/O requests directed to tens to thousands of mirrored LUN pairs. Nonetheless, the problems illustrated in FIGS. 4A-D are representative, on a small scale, of the many concurrently overlapping problems experienced by a disk array controller.
FIGS. 4A-D employ similar illustrative conventions. These conventions are described with reference to FIG. 4A, and many of the numerical labels introduced in FIG. 4A will be used in all of FIGS. 4A-D, as well as in FIGS. 5A-D, discussed in a following section.
In FIG. 4A, I/O requests directed to a mirrored LUN pair are input to an input queue 402. These I/O requests are dequeued from the input queue 402 and processed by the disk array controller. In the current discussion, the I/O requests are assumed to be WRITE requests. The disk array controller dequeues WRITE requests from the input queue 402 and processes each WRITE request by writing data to a local LUN 404 and queuing the WRITE request to an output queue 406 from which the WRITE request will be transferred to the remote LUN of a remote disk array 408. Note that the output queue 406 represents, for current purposes, the entire process of transferring a mirror WRITE request to the remote LUN and executing the WRITE request there. As will be seen below, problems may arise from a backlog of pending mirror WRITE requests, but it is immaterial, in the context of the present discussion, which step in the overall process of transferring and executing mirror WRITE requests represents the throttle point, or rate-limiting step. The LUN mirroring can also occur within a single disk array, in which both LUNs of a mirrored LUN pair are local LUNs. However, it is convenient for descriptive purposes to discuss an embodiment in which one LUN of a mirrored pair resides in a local disk array and the other LUN of the mirrored LUN pair resides in a remote disk array, as shown in FIG. 4A. Note that FIG. 4A is an abstract representation of actual I/O request processing that occurs within a disk array controller. In many implementations, there are not separate input and output queues for each LUN, for example, and many different types of I/O requests are processed, in addition to WRITE requests. Again however, the complex problems arising in a disk array can be thought of as a combination of many problems similar to the problems illustrated in FIGS. 4A-D for a single mirrored LUN pair.
The simple scheme illustrated in FIG. 4A can be abstractly represented in a slightly different manner. In FIG. 4B, input queue 402 of FIG. 4A is divided into two input queues: (1) an input queue 410 that contains WRITE I/O requests directed to the local LUN; and (2) an input queue 412 containing mirror WRITE requests directed to the remote LUN of the local LUN/remote LUN mirrored pair. Thus, the two input queues represent the two separate WRITE operations conducted by a disk array controller for each host-computer WRITE request received by the disk array controller directed to a mirror pair. When the available WRITE request processing bandwidth within the local disk array is sufficient for processing WRITE requests directed to the local LUN as well as mirror WRITE requests directed to the remote LUN, then the sizes of the two input queues 410 and 412 and the output queue 406 tend to reach steady-state, equilibrium sizes, serving to briefly buffer small fluctuations in the rate of reception of WRITE requests and processing of the WRITE requests by the disk array controller.
Often, however, the available bandwidth within the disk array controller is insufficient to handle the combined WRITE-request processing requirements for locally executed WRITE requests and for mirror WRITE requests forwarded to the remote disk array. In this case, the sizes of at least one of the two input queues and output queue needs to grow in order to buffer WRITE requests received at a higher rate than they can be processed by the disk array controller. A first, common approach to processing WRITE requests in the face of insufficient available WRITE-request processing bandwidth is illustrated in FIG. 4C. In this approach, transfer of WRITE requests to the remote LUN is considered asynchronous with respect to execution of WRITE requests by the local LUN. In other words, when a host-computer WRITE request is received, the WRITE request is executed on the local LUN without regard to the eventual time of execution of the WRITE request on the remote LUN. In this case, the output queue 406 tends to increase in size, as illustrated in FIG. 4C. As the output queue 406 grows in size, more and more WRITE requests are buffered within the output queue, and the data state of the remote LUN falls further and further behind the data state of the local LUN. The disk array manages to execute WRITE requests received from host computers without perceptible delay, trading off quick response time to host computers for an increasingly large discrepancy between the data state of the local LUN and that of the remote LUN. When a split operation is directed to the disk array controller, the disk array controller needs to discontinue processing I/O requests received from external sources, and devote most of the available bandwidth to processing mirror WRITE requests, dequeued from the output queue 406, in order to bring the data state of the remote LUN to a state consistent with respect to the data state of the local LUN. Hence, for large LUNs, the split operation may take several hours in order to transfer WRITE requests backed-up on the output queue 406 to the remote LUN and execute those WRITE requests on the remote LUN.
An alternative approach to processing WRITE requests by the disk array controller, when insufficient WRITE-request processing bandwidth is available, is illustrated in FIG. 4D. In this approach, execution of a WRITE request on the local LUN is synchronous with execution of the corresponding mirror WRITE request on the remote LUN. In other words, the mirror WRITE request must be executed to completion on both the local LUN and remote LUN prior to processing of another host-computer WRITE request directed to the mirror pair. In this synchronous mirroring mode, as illustrated in FIG. 4D, the input queue 410 containing WRITE requests directed to the local LUN grows in size in order to buffer WRITE requests received from host computers that cannot be processed as quickly as they are received. In synchronous mirroring mode, a split operation can be accomplished almost instantaneously, because there are no backed-up WRITE requests languishing in the output queue, as is the case under asynchronous mirroring, illustrated in FIG. 4C. However, synchronous mode mirroring may result in serious interruption of host-I/O-request processing by the disk array controller. The WRITE requests buffered within input queue 410 in FIG. 4D may not be processed in a timely fashion, and host computer processes that issued the WRITE requests may be stalled or significantly slowed by increasing I/O request latency.
In general, asynchronous mirroring mode is preferred over synchronous mirroring mode, because, in general, timely processing of host-computer I/O requests is more important than mirror pair consistency. However, under asynchronous mirroring mode, mirror split operations are often quite slow, requiring tens of minutes to hours needed to flush an execute unprocessed WRITE requests directed to the remote LUN of a local LUN/remote LUN mirror pair. For this reason, designers, manufacturers, and users of data storage devices, including disk arrays, have recognized the need for a LUN mirroring method that does not significantly impact throughput of host I/O request processing but that supports relatively quick mirror split operations in order to provide hardware-level data storage backup capabilities.
In one embodiment of the present invention, a split advance warning feature is added to the controller of a disk array. This feature allows a host-computer application program, or host-computer system routine, to notify the disk-array controller of an impending mirror split operation related to a particular mirrored LUN pair provided by the disk array controller and, in the case that a LUN of the mirrored LUN pair is located in a remote disk array, by both the disk array controller and the disk array controller of a remote disk array. Upon receiving a split advance warning for a particular mirrored LUN pair, the disk-array controller can shift processing I/O requests for the mirrored LUN pair from a purely asynchronous mode, in which processing of WRITE requests directed to the local LUN are preferentially executed and asynchronously executed with respect to corresponding mirror WRITE requests directed to the remote LUN, to a hybrid WRITE-request processing mode that attempts to place the local LUN and mirror LUN in approximately data consistent states at the time that the impending mirror split operation is expected to occur. Thus, prior to a mirror split operation, processing of host-computer I/O requests on the local LUN may slow in order to process a backlog of mirror WRITE requests directed to the remote LUN.