Software-based fault-tolerant systems may be considered as organised into one or more recovery units each of which constitutes a unit of failure and recovery. A recovery unit may be considered as made up of a live process, an arrangement for logging recovery information relevant to that live process, and recovery means, which in the event of failure of the live process, causes a replacement process to take over.
Of course, if failure of the live process due to failure of the processor running it is to be covered, then both the storage of recovery information and the recovery means itself must be separate from the processor running the live process.
Where a system comprises multiple recovery units, these will typically overlap in terms of processor utilisation; for example, the processor targetted to run the replacement process for a first recovery unit, may also be the processor running the live process of a second recovery unit. In fact, there may also be common resource utilisation by the recovery units in respect of their logging and recovery means.
An illustrative prior-art fault-tolerant computer system is shown in FIG. 1 of the accompanying drawings. This system comprises three processors I, II, III and a disc unit 10 all interconnected by a LAN 11. The system is organised, as two recovery units A and B each of which has an associated live process A/L, B/L. Live process A/L runs on processor I and live process B/L runs on processor II. Recovery unit A is arranged such that upon failure of its live process A/L, a replacement process A/R will be take over on processor II; similarly, recovery unit B is arranged such that should live process B/L fail, a replacement process B/R takes over on processor III.
A live process will progress through a succession of internal states depending on its deterministic behaviour and on non-deterministic events such as external inputs (including messages received from other live processes, where present) and non-deterministic internal events.
When a replacement process takes over from a failed live process, the replacement process must be placed in a state that the failed process achieved (though not necessarily its most current pre-failure state). To do this, it is necessary to know state information on the live process at at least one point prior to failure; furthermore, if information is also known on the non-deterministic events experienced by the failed process, it is possible to run the replacement process forward from the state known about for the failed process, to some later state achieved by the latter process.
Where speed of recovery is not critical, an approach may be used where state information on the live process (process A/L in FIG. 1) is periodically checkpointed by the logging means of the recovery unit from the volatile memory of the processor running the process to stable store (disc unit 10). Upon failure of the live process A/L, the recovery means of the recovery unit can bring tip a replacement process A/R in a state corresponding to the last-checkpointed state of the failed live process. Of course, unless check-pointing is effected at every state change, the state of the replacement process A/R will generally be behind the actual state achieved by the live process prior to failure. This can be alleviated by having the logging means of the recovery unit securely store appropriate information on all non-deterministic events experienced by the live process between its checkpoints and then arranging for the recovery means to replay these events to the replacement process to bring it more up-to-date.
Where speed of recovery is critical, it is generally preferred to run at least one replicate process (process B/R in FIG. 1) that shadows the processing of the live process B/L and receives the same non-deterministic events as the latter; in this context, the live and replicate processes are also known as the primary and secondary processes respectively. The replicate process B/R effectively acts as a store of state information on the live process B/L. The live process and its replicate may be tightly coupled so that they are always in step, or loosely coupled with the replicate generally lagging behind the live process. Upon failure of the live process B/L the recovery means causes the replicate process to take over as the replacement process B/R; where the coupling between the live process and its duplicate is only loose, if appropriate information on the non-deterministic events experienced by the live process has been stored, the replicate may be brought more up-to-date by using this information.
The present invention is concerned with software fault-tolerant systems that employ logging to a replicate process.
It will be apparent that the overhead involved in logging recovery information is considerably greater in arrangements where the replacement process is brought up to the state of the live process at failure. In fact, it is not necessary to put the replacement process in the same state as the live process at failure; instead, the need is to put the replacement process into the last externally visible state, meaning the last state in which the recovery unit produced output either to externally of she fault-tolerant system or to other recovery units in the system. Put in other words, the requirement is that there be no lost states as perceived from externally of the recovery unit. Because it may not be possible to control events external to the system, before any system-external output is made, the logging means of the recovery unit is generally caused either to checkpoint the live-process state information and securely store non-deterministic event information, or in the case of a loosely coupled replicate process to ensure that the latter has received the same non-deterministic event information as the live process. This procedure is known as `output commit` and can constitute a substantial processing overhead.
The same procedure can be used in relation to output made to other recovery units in the fault-tolerant system though if this is not possible (for example, because the overhead involved in providing this ability is considered too great), then the recovery means will need to "roll back" the non-failed live processes to states consistent with the one into which the replacement process can be put. Rollback is, however, a complex procedure and generally not attractive.
It is an object of the present invention to provide a simplified arrangement for ensuring that there are no lost states when a replicate process is promoted to take over from a failed live process.