The invention relates to multi-computer systems. More particularly, the invention relates to methods and equipment for recovery processing in the event of a computer failure.
Multi-computer systems employ a plurality of computers. A single computer runs a single kernel, whereas a multi-computer runs multiple kernels. As so defined, a single computer may include one or many processors. The constituent computers in a multi-computer may be in distinct physical units (e.g., chassis or circuit boards) or individual processors together in the same physical unit or some combination of both. One example of a multi-computer system is one in which redundant computers are present. The use of redundant devices in computer systems is advantageous in critical applications because the redundant devices increase system availability, survivability, and robustness. In a multi-computer systems one or more computers may be redundant computers able to take over processing workload abandoned by a primary computer that has failed. A redundant computer may be connected to peripheral storage devices (e.g., disk drives) through separate hardware paths than used to connect a primary computer to the peripheral storage devices. A redundant computer may be inactive unless and until the primary computer fails. Alternatively, multiple active computers in the same scalable computer system may be redundant with respect to each other in the sense that one of the computers can takeover execution of an application program (also called simply xe2x80x9capplicationxe2x80x9d) originally executed by computer that subsequently fails.
A computer stores data to peripheral storage by issuing one or more xe2x80x9cwrite requestsxe2x80x9d (sometimes simply referred to as a xe2x80x9cwritesxe2x80x9d) to a peripheral storage device. Typically write operations are asynchronous, i.e., the computer issues a write request and is notified of its completion some time later. A busy computer may have multiple writes outstanding at the same time. Once issued, a write request may be delayed, perhaps indefinitely. Sources of indefinite delay include failure of communications channels between the computer and the storage device, and failures within the storage device itself. Normally, a computer xe2x80x9ctimes outxe2x80x9d such writes and reissues them if necessary. Failing receipt of a write completion acknowledgment, the requesting computer typically records the failed state of the peripheral storage device. When an operating computer fails, it may have pending writes that have been issued but neither completed nor timed-out. When a pending write requests from a failed computer is effectuated (i.e., actually written on a storage device), the phenomena is referred to herein as a xe2x80x9cghost write.xe2x80x9d That is, a ghost write is effectuated on behalf of a dead computer.
A ghost write may seriously interfere with recovery processing whereby a redundant computer takes over for the failed computer. When an operating computer fails, a recovery routine is executed to transfer application programs (also called, more simply, xe2x80x9capplicationsxe2x80x9d) to a redundant computer. An application is typically terminated and then restarted on the redundant computer. Recovery of an application following unexpected termination usually involves reading peripheral storage utilized by the application and analyzing the data to determine the state of the application program at the time of the failure. If necessary, the state of the peripheral storage devices is altered so as to eliminate partially completed writes. The objective of the alteration is to back up in time to a known good, consistent state, because partially completed writes place the storage devices contents into an inconsistent state. A ghost writes can interfere with the recovery processing by either (1) corrupting the data read from the peripheral storage or (2) overwriting corrections made for the purpose of restoring consistency.
In one respect, the invention is a method for preventing ghost writes in a multi-computer system. A first computer in the multi-computer system issues one or more write requests to a storage device, each write request normally being effectuated after being pending for a time. The method generally comprises the steps of detecting a condition indicative of a failure associated with the first computer and preventing the effectuation of write requests issued by the first computer to the storage device and pending at the time of the detected condition. In one embodiment, the condition indicative of a failure associated with the first computer comprises at least one of the group consisting of a reduction in the number of known operating computers in the multi-computer system, and a state change of a volume group associated with the computer on the storage device from inaccessible to accessible and a reduction in the number of known operating computers in the multi-computer system. In another embodiment, the failure is a communication failure associated with one or more write requests to a storage device connected along a plurality of redundant communications channels. In yet another embodiment, the preventing step comprises at least one action from the group consisting of issuing a fibre channel target reset on a fiber channel connected to the storage device, issuing a fibre channel remote logout on a fiber channel connected to the storage device, issuing a fibre channel remote logout and a requiring logout from multiple fibre channels on a fiber channel connected to the storage device, and issuing a bus device reset on a bus connected to the storage device.
In other respects, the invention is computer software embedded on a computer readable medium. The computer software comprises instructions for implementing the methods just summarized.
In yet another respect, the invention is an apparatus. The apparatus comprises a first computer, a storage device, at least one adapter connected between the first computer and the storage device, and an application executing on the first computer. The adapter comprises a protocol capable of selectively eliminating write requests issued to the storage device before effectuation of the selected write requests. The application comprises an activation procedure that executes upon startup of the application. The activation procedure is connected to the adapter and commands the adapter to eliminate uneffectuated write requests. In one embodiment, the activation procedure executes following at least one event from the group consisting of a reduction in the number of known operating computers in the multi-computer system, and a state change of a volume group associated with the computer on the storage device from inaccessible to accessible and a reduction in the number of known operating computers in the multi-computer system. In another embodiment, the activation procedure performs at least one action from the group consisting of issuing a fibre channel target reset on a fiber channel connected to the storage device, issuing a fibre channel remote logout on a fiber channel connected to the storage device, issuing a fibre channel remote logout and a requiring logout from multiple fibre channels on a fiber channel connected to the storage device, and issuing a bus device reset on a bus connected to the storage device.
In comparison to the prior art, certain embodiments of the present invention are capable of achieving certain advantages, including the ability to recover more satisfactorily from a computer failure when there is the potential for ghost writes.
Those skilled in the art will appreciate these and other advantages and benefits of various embodiments of the invention upon reading the following detailed description of a preferred embodiment with reference to the drawings.