This invention relates generally to a method and apparatus for checking the status of transactions on a computer system, and more specifically, to a method and apparatus for providing a client computer system with the status of a current transaction when a connection fails in a distributed processing environment.
The ability of a computer system to respond to a failed connection in a manner that minimizes lost data and the corruption of work in progress is known as a system's "fault tolerance." Generally, failed connections are the result of a power outage or of a hardware failure of some kind. Fault tolerance of a computer system is typically accomplished with a battery-backed power supply, redundant hardware, provisions in the operating system or a combination of these system-saving components.
In a fault-tolerant system, the computer system has the ability either to continue the system's operation without loss of data or to restart the system's operation to recover transactions in progress when the fault occurred. A system failure affecting a network user is generally caused by one or more of three major classes of system failures. The classes are (1) a failure on a client, generally a PC, (2) a failure of the network, and (3) a failure of the host system.
Various methods are employed to recover from a loss of connectivity. For example, a failure of a client PC in a network is generally followed by re-booting the client PC and restarting the client computer system. In contrast, failure of a host is followed by a recovery method suited for the specific cause of the host failure. For example, the host failure could be due to either a failed multi-LAN process, a failed client's message control process or even a total system failure.
Similarly, the recovery method for a network failure will depend upon the specific cause of the failure. For example, a network failure could be due to an outage of the physical medium, network repeater, bridge or router. Each class of system failure described above with regard to either the host or network could be caused by a design or operational error, an equipment outage or from environmental conditions.
Regardless of the class of failure that occurs on the client PC, host system, or network, conventional fault tolerant systems only inform the client of the loss of connectivity. Conventional systems cannot recover the status of the current transaction in progress when the lost connection occurred. When a client computer system restarts or regains connectivity, the client computer system needs to know the status of its current transaction so that it can determine whether or not to re-execute the current transaction. More specifically, the client computer system needs to know whether the current transaction committed before the system failure occurred.
Although conventional fault tolerant systems can recover from a system failure, there remains an unmet need to provide a client computer system with a way to effectively recover from system failure before the client receives notice that a transaction has committed.