This invention relates to a multi-processor computer system including first and second processing sets (each of which may comprise one or more processors) which communicate with an I/O device bus.
The application finds particular application to fault tolerant computer systems where two or more processor sets need to communicate with an I/O device bus in lockstep with provision for identifying lockstep errors in order to identify faulty operation of the system as a whole.
It has been proposed to identify lockstep errors by comparing I/O operations output by the processing sets. This is a reliable method as long as the outputs from the processor sets are deterministic, that is as long as long as the outputs from the processing sets are going to be identical when the processors are operating correctly. This will normally be the case when the processors are operating synchronously under the control of a common clock. However, it can be that some events may not be wholly deterministic. An example of this is where the individual processing sets each include a real time clock which are not synchronized. If the real time clocks are sampled by the respective processing sets, then a different sample may be generated in the respective processing sets. For example, if the sample is taken at just the time at which the clocks are incremented, one processing set may take the sample just before the increment and the other may take the sample just after the increment.
Where these different values are output by the processing sets and are evaluated by a comparator check for lockstep operation, this would be determined as a lockstep error. A further problem is that such values may be operated on by the individual processing sets which could lead to further divergence of the processing sets.
These problems have had the effect of limiting the design of lockstep systems to avoid the danger of potentially non-deterministic events. Accordingly, an aim of the invention is to overcome or at least mitigate these problems.