1. Technical Field
The present invention relates in general to a method and system for recovering data from mirror drives following a system failure. Still more particularly, the present invention relates to method for recovering data from parallel mirror drives by recording the first drive to store a data item.
2. Description of the Related Art
Computer systems in general and International Business Machines (IBM) compatible personal computer systems in particular have attained widespread use for providing computer power to many segments of today""s modern society. A personal computer system can usually be defined as a desk top, floor standing, or portable microcomputer that includes a system unit having a system processor and associated volatile and non-volatile memory, a display monitor, a keyboard, one or more diskette drives, a fixed disk storage device and an optional printer. One of the distinguishing characteristics of these systems is the use of a system board to electrically connect these components together. These personal computer systems are information handling systems which are designed primarily to give independent computing power to a single user (or a relatively small group of users in the case of personal computers which serve as computer server systems) and are inexpensively priced for purchase by individuals or small businesses. A personal computer system may also include one or a plurality of I/O devices (i.e. peripheral devices) which are coupled to the system processor and which perform specialized functions. Examples of I/O devices include modems, sound and video devices or specialized communication devices. Mass storage devices such as hard disks, CD-ROM drives and magneto-optical drives (nonvolatile storage devices) are also considered to be peripheral devices. Data stored on nonvolatile storage devices is often extremely critical to an organization. Due to their critical nature, nonvolatile storage devices are often backed up on a regular basis in case a disaster or other failure occurs to the computer system and the nonvolatile storage devices attached to it.
Backing up a nonvolatile storage device can be a slow process if the data is backed up onto another type of media, such as backing up a hard drive to a set of magnetic tapes. In addition, backups onto magnetic tapes may only occur on a periodic basis. For example, a system backup may be taken every night. Data to the nonvolatile storage device after the last backup is lost if a failure occurs destroying the nonvolatile storage device.
To provide greater data protection regarding a system""s data stored on nonvolatile storage devices, many systems use drive arrays to increase capacity and reliability of their nonvolatile storage. Essentially, a drive array is a way to combine a number of individual nonvolatile storage devices to create a massive virtual system by connecting the separate drives to a single drive controller and coordinating their operation.
Several implementations of drive arrays have been developed to both increase capacity and reliability of nonvolatile storage. One method of implementing drive arrays is using Redundant Arrays of Inexpensive Disks, or xe2x80x9cRAID,xe2x80x9d technology. RAID technology has various levels (i.e., RAID Level 0, 1, 2, 3, etc.). Each level is a different implementation of multiple disks. For example, RAID Level 1 refers mirroring the contents of one disk onto another disk. In this manner, if the first disk fails, the same data would be intact on the second disk. Conversely, if the second disk failed the data would be intact on the first disk.
Two basic approaches to mirrored disks are used. The first approach treats the first disk as the xe2x80x9cmasterxe2x80x9d disk and the second disk as the xe2x80x9cbackupxe2x80x9d disk. Data is first written to the master disk by the controller and then written to the backup disk. This method is referred to as xe2x80x9csequentialxe2x80x9d mirroring because a master-subordinate relationship is used to control the disks. The disadvantage of sequential mirroring, however, is that it is relatively slow, requiring a sequential writing operation. If the master disk is busy while the backup disk is idle, the operation has to wait for the master disk even though the backup disk is ready.
The second approach to mirroring disks is referred to as xe2x80x9cparallelxe2x80x9d mirrors. As the name implies, data is written to the various disks that comprise the mirror in parallel. If the first disk is busy while the second disk is idle, the second disk can immediately store the data while the first data finishes its previous operation and then stores the data. Because of the parallel nature, however, no master-backup relationship exists. Consequently, when the system recovers from a failure (i.e., the system crashed), it is unknown which disk contains the latest information. Indeed, since the drives are operating in parallel, one drive may have the latest information for one disk address while another drive has the latest information for another disk address. Determining the latest data that resides on parallel mirrored disks is a challenge for drive manufacturers and their customers. This challenge is exacerbated by the fact that multiple mirror disks may exist (i.e., mirroring data across 3, or 4 disks rather than only having 2 disks).
In IBM""s AIX operating system (similar to the UNIX operating system), a data queue is maintained that keeps track of the last 62 writes that are about to be written to disk. During recovery processing following a system failure, the disk controller reads the queue to attempt to synchronize the last 62 writes. Because no true master drive exists, the controller selects the first drive that will communicate with the controller and designates this drive to be the master drive. However, as discussed above, at the time of the crash the designated master may not have the latest data. This will result in older data being copied over newer data during system recovery. For example, if drive 1 is designated the master but drive 0 has the latest data in address 100 (data that was written to drive 0 just before the crash), the controller will copy the old data from address 100 on drive 1 to address 100 on drive 0 (thus replacing the new data in that address on drive 0). The xe2x80x9clatestxe2x80x9d data held on drive 0 is thereby overwritten. After the recovery operation, all mirror drives will once again be identical. By copying the data in this fashion, this method, known as a Mirror Write Consistency Check (MWCC), guarantees consistent, rather than the latest, data. Thus, a challenge of the prior art is determining which disk contains the latest data for a given disk address.
What is needed, therefore, is a way to identify the latest data residing on any one disk in a parallel mirror array. In addition, recording the latest data would be preferably performed quickly requiring few, if any, additional writes to the nonvolatile storage devices to record the information.
It has been discovered that keeping a completion array in memory that keeps track of the first disk to complete a write operation provides for the latest data to be propagated to the other disks when a recovery from a system failure occurs. The in-memory array would keep track of the disk number that first successfully wrote data and returned the request along with the disk address where the data resides.
In another embodiment, the Mirror Write Consistency Check (MWCC) array is compared with the in-memory array during failure recovery to match the data that was about to be written in the MWCC with the data residing on the first disk identified in the completion array.
Upon a system failure, the in-memory array is written to a system dump area on the disk as there is a small window of time after a system failure where the status of the system is written to a nonvolatile storage area (i.e., nonvolatile memory, disk area, etc.). When the system is rebooted following the failure, the system identifies and rebuilds the array that was dumped to nonvolatile storage. The array is then processed identifying the disk containing the latest data for a particular disk address. That disk is used as the master disk for that particular disk address and the data is copied to the other disks. The next item in the array would contain the next disk address and the disk that first wrote the data to its platter. This process would continue until all items in the array have been read and the corresponding data copied to the disks.
In yet another embodiment, the completion array is written to nonvolatile storage as well as memory so that, upon recovering from a failure, the completion array can be read from the nonvolatile storage device and processed accordingly.
A computer system that uses a completion array to record disk writes and recover from system failures is further disclosed. In addition, a computer program product is disclosed for storing the method of recording the completion array and the method of recovering from a system failure using the stored completion array is further disclosed.
The foregoing is a summary and thus contains, by necessity, simplifications, generalizations, and omissions of detail; consequently, those skilled in the art will appreciate that the summary is illustrative only and is not intended to be in any way limiting. Other aspects, inventive features, and advantages of the present invention, as defined solely by the claims, will become apparent in the non-limiting detailed description set forth below.