1. Field of the Invention
The present invention is directed to computer systems that support disk mirroring, and more particularly, to methods and apparatus for transferring a mirrored set from a failed system to a standby system during a fail-over operation.
2. Description of the Prior Art
Certain computer systems, such as, for example, Unisys enterprise servers that run the Unisys MCP operating system, including A Series and ClearPath NX computer systems, have for some time provided the ability to maintain from two to four disks as a mirrored set, that is, as exact copies of each other. Initial creation of a mirrored set involves copying all of the data from a source disk to a destination disk. Up to two additional copies can be added later in the same way. On the aforementioned Unisys systems, mirrored sets are created using a mirror create process that employs a series of read/write pairs of large blocks of data. The source data is read, then written to the destination disk. Creation and maintenance of the mirrored set is managed by the operating system (MCP). This differs from other computer vendors who place responsibility for creating and maintaining mirrored sets in the disk controller. The Unisys approach, in which disk mirroring is managed by the operating system, is advantageous because failure of a controller will not necessarily cause a loss of the mirrored disk set.
In order to maintain its mirrored disk sets, the Unisys MCP operating system uses three main structures: a portion of the physical pack label area, a mirror information table (MIT) and an outstanding write list (OWL). The mirror information portion of the physical pack label contains a relative set member, member timestamp and member status, along with a four-bit mask of current set members; it is the only part of a mirrored pack not kept identical across all ONLINE members. Unless a mirrored pack indicates "closed" status in its label, there must be an entry for that mirrored pack in the MIT for the pack to be brought online as a mirrored unit. (Closing an in-use mirrored set updates the labels of all current members to indicate this status and removes the set's entry from the MIT; only when closed can a mirrored set be ported from one system to another intact.) The MIT contains status information about all in-use mirrored sets of a system, and is stored on a special system disk called the halt/load unit. The OWL is maintained in non-volatile system memory, and is a record of all write operations to mirrored sets that have been begun but not yet completed. The OWL contains validation timestamps indicating the MIT to which it corresponds--they differ, old and new, only when the MIT is being updated. In the event of a system interruption, the ONLINE members of all mirrored sets will be preserved provided the MIT and OWL previously in-use are still intact. A fourth mirroring structure, the audit table, is a record of all out-of-data areas of temporarily OFFLINE mirrored set members. Returning an OFFLINE member to ONLINE status involves audit application, updating its out-of-date areas to current from other ONLINE members. Because the audit table is kept in volatile system memory, it is not maintained across system interruptions, so OFFLINE members of mirrored sets are lost across system interruptions.
FIG. 1 illustrates the overall contents of a mirror information table 10. The table includes a header portion 12, followed by an entry 14 for each mirrored set on the system. As shown, the header 12 holds the OWL synchronization timestamp (MIT.sub.-- TIMESTAMP.sub.-- INX), as well as other information (not shown). In the present embodiment, the entries 14 that follow the header 12 are each 16-words in length.
FIG. 2 shows the overall structure of one MIT entry 14. The first four words 16 of the entry contain information about the mirrored set as a whole. The MIT.sub.-- SERIALNO word contains critical state and option information, including the set serial number (mit.sub.-- serialnof), the OWL loss recovery (mit.sub.-- optionf) and transient error recovery (mit.sub.-- quickaudit.sub.-- okf) strategies to be used for the set, and information used to support the Mirror Disk Pooling Facility (MDPF), if this optional feature is licensed to a site. (MDPF allows automated restoration of mirrored set members lost in most circumstances, minimizing the operator intervention required.) The MIT.sub.-- ID word holds the current set timestamp (MIRROR.sub.-- ID of set). The MIT.sub.-- SETINFO1 word holds transient MCP status information not currently preserved across system interruptions. The fourth word is currently unused, having previously held in-core only information.
Referring to the MIT.sub.-- SERIALNO word of the entry 14, the two OWL loss recovery options allowed are DISCARD (mit.sub.-- optionf=0), which means that the MCP operating system is responsible for data resynchronization, and DMS (mit.sub.-- optionf=1), which means that application programs are responsible for data resynchronization. The DMS option is named for the Unisys Data Management System that normally performs this task. The mit.sub.-- owl.sub.-- lostf ("the OWL has been lost") and mit.sub.-- break.sub.-- setf ("set is to be broken") flags are status flags indicating how a set must be handled when it reappears after an interruption. Once set, these flags remain set until this handling has been completed. Note that an "OWL has been lost" condition always necessitates breaking of in-use mirrored sets with a recovery option of DISCARD, because the MCP does not have the critical OWL structure information required to resynchronize set members. Either the mit.sub.-- recreate.sub.-- setf flag ("MDPF recreate set") or the mit.sub.-- break.sub.-- setf flag is always set along with mit.sub.-- owl.sub.-- lostf flag for sets with a recovery option of DISCARD across an interruption. When individual set members (rather than the entire set) are lost and MDPF recreation is possible, a running count of lost members needing replacement is kept in the mit.sub.-- set.sub.-- needs.sub.-- replacementf field. The four-bit mit.sub.-- recreate.sub.-- maskf field indicates which member(s) to recreate. The mit.sub.-- noaccessf flag ("set never accessed") is a new flag that has been added in accordance with an aspect of the present invention. This field is described hereinafter in greater detail.
Still referring to FIG. 2, the remaining twelve words of a MIT entry (shown at 18) consists of three words for each of the four (4) possible set members. A member is valid if its member timestamp word (MIT.sub.-- LABEL.sub.-- ID) is non-zero. When the labels of one or more mirrored set members are being updated, the new timestamp is first stored into the MIRROR.sub.-- ID of the set and the MIT preserved to disk. Then, as individual members' labels are updated, new MIT.sub.-- LABEL.sub.-- ID timestamps are stored into the in-core MIT. Finally, when all label updates have been completed, the MIT is again written to disk. This ensures that a valid set member label must always match either its MIT.sub.-- LABEL.sub.-- ID or the MIRROR.sub.-- ID of its set (or both, if they are identical). The MIT.sub.-- SETINFO2 word of a member entry has MCP physical pack information, along with two critical fields: the current member status (mit.sub.-- statef) and its logical unit association (mirror.sub.-- lu.sub.-- nof). A non-zero value in mirror.sub.-- lu.sub.-- nof means that a member is currently "known" by the MCP to be associated with a particular physical unit. A value of zero means that the current MCP incarnation has not logically seen this member yet. The mit.sub.-- statef of a member indicates its current logical status, typically ONLINE if currently in use, OFFLINE if an audit trail is being kept for the member, APPLYING AUDIT if a member is either being created or being restored to ONLINE status (mirror creation can be considered application of an audit trail specifying that the entire pack is out of date), possibly ORPHAN if the member is known to be out-of-date but its physical label on disk has not yet been invalidated.
After a system interruption, MCP initialization pre-processes the MIT before dealing with any mirrored sets. All members with ONLINE status are changed to ONLINE.sub.-- BEFORE.sub.-- HALT.sub.-- LOAD (OLBH) status pending their reappearance. Halt/Load is a Unisys term that refers to the process by which a system running the MCP operating system is started--analogous to the boot process in a desktop computer. Members in the OFFLINE, GOING OFFLINE and APPLYING AUDIT states are either set OFFLINE and marked for possible recreation (setting mit.sub.-- recreate.sub.-- setf along with the proper bit in the mit.sub.-- recreate.sub.-- mask field) or immediately converted to ORPHAN status. ORPHAN members, or those in the transient SET BEING OPENED/CLOSED states are unaffected. All logical unit associations are invalidated.
Recently, customers have begun to request redundant systems to maintain high availability of their computing facilities. Essentially, these high availability configurations consist of two separate, but connected, computer systems (e.g., two A Series computers or two ClearPath HMP computer systems) of similar capability. One of the two systems functions as a "hot" standby, while the other serves as the active, or currently operational system. In the event of a failure of the active system, customer operations "fail-over" to the standby system. In these configurations, the peripherals needed for active system operation, including all those disk packs comprising the customer "pack data farm", are connected to both systems. For maximum physical security of data, these connections may be by means of something akin to an A/B switch. Initially, peripheral access is only via the active system, but in the event of a fail-over, the switches can be flipped so that the peripherals are then connected to the standby system.
FIG. 3 is a block diagram illustrating an exemplary redundant configuration comprising two systems, A and B. In this example, assume that system B is presently the active system and that system A is the standby. Each system operates under its own control of the Unisys MCP operating system, and each system has its own MIT. The MIT for each system is stored in the Halt/Load Unit for that system which, as mentioned above, is a special system disk. The systems are connected through respective switches ("S") to two busses of disk units (D1, D2, D3 and D4, D5, D6, respectively). In this example, the busses to which the respective disk units are attached are SCSI busses, and the switches S are SCSI A/B switches. As illustrated by the arrow between disk units D1 and D4, these two disk units can comprise a mirrored set. That is, D1 will be a mirror of D4. Other mirrored sets can exist and may contain more than one disk unit.
In this example, since system B is the active system, the SCSI switches are set to connect the respective disks to that system's I/O subsystem. In the event of a failure on system B, the SCSI switches can be switched-over to connect the respective disks to the standby system A. That system can then take-over computing responsibilities.
A problem that arises with these redundant systems is how to accomplish "fail-over" of a customer's pack data farm when some or all of this data is on MCP-mirrored disks. Since almost none of the mirrored sets involved will have "closed" status at the time of an unforeseen failure, while they can be physically switched to the standby system, the packs cannot be brought online for use without breaking the mirrored sets. Thus, the standby system incurs the overhead of having to recreate all its mirrored sets for maximum data safety. And it may have to do so absent knowledge that one physical unit is the "wrong" choice as the source copy, because of peripheral problems that were occurring at the time of the fail-over. Moreover, until recreation of a given set is complete, failure of the source pack unit can completely disrupt site operations, ruining the maximized availability for which the redundant configuration was intended. Thus, there is a need for methods and apparatus for migrating (i.e., transferring) in-use mirrored sets from one system to another, at least for sets with a recovery option of DMS (mit.sub.-- optionf=1). The present invention satisfies this need.