Replication on one replication machine of programs executing on an operational machine includes record and replay of events that produce non deterministic results. For events producing non deterministic results which are recorded on the operational machine the event information is transferred from the operational machine to the other machine for replay. The impact is overhead on the machines and communication cost for transfer of event information. Events producing deterministic results during a program execution are not recorded as they can be reproduced by simple checkpointing and re-execution of the program on the replication machine after restoring the checkpointed environment.
During the execution of a program, when the system clock is accessed, it typically produces a non deterministic event which is recorded in the operational machine, transferred from the operational machine to the replication machine and replayed in the replication machine using the event information. This standard solution could be acceptable when replication of programs is performed for debugging of these programs. In this case the operational machine for application recording and the replication machine for application replay may be the same machines, the replay being definitely a ‘later replay’. This standard solution may be also acceptable when replication is done for legal archiving purpose. For fault tolerant system, wherein executing programs need to be restored on a backup machine in case of failure of the operational machine, this standard solution is not acceptable as the fault tolerant system needs an immediate switch from the operational machine to a backup machine in case of failure. The backup machine is maintained active, the operational machine performs records of event data and periodic transfer of these data to the backup machine. The backup machine uses each event data transfer at each transfer to replicate the program execution.
In the rest of the document replication of executing programs may designate replication per application or per operating system depending on the type of virtualization technology.
If the application to be replicated implements communication protocols or transactional applications such as server applications, it implements intensive accesses to the clock, may be up to several thousands accesses to the clock per second, which leads to an important slow down of the replication process. This is a strong problem when replication is used in a fault tolerant system, wherein an application runs on a operational machine and its execution is immediately and entirely replicated on a second machine in order to immediate recover, in case of primary machine failure.
Hardware based solutions are the most reliable solutions for fault tolerant systems with on the fly recovery and they are successful today. A so called ‘lockstep mode’ of replication consists in replicating instruction by instruction execution on two synchronous CPUs on one unique clock. It allows a single physical clock to drive several processors, by extending the bus between 2 motherboards and forwarding the clock signal. It implies a clock transfer at 10 MHz rate on an optical fiber the operational and backup computer systems being tightly coupled. This mode does not allow the computer systems to be really distributed. It also requires homogeneous replica systems (same processors running at the same speed). The Integrity NonStop Servers which have Itanium based processors of Hewlett-Packard use additional redundant CPUs running the same instruction stream. When a fault is detected (e.g. by lockstep mismatch), the failing module is disabled while the redundant module continues processing the instruction stream without interruption. The Stratus© ftServers© of Stratus Technologies are similar. This hardware based fault tolerant systems have some physical constraints such as that the operational and backup systems must be in the same building. Furthermore, the hardware solution implies replicating the entire physical system and does not provide the granularity of virtualization by application as done with software implementable solutions.
In order to avoid the hardware limitations, it seems preferable to come back to a software implementable solution. It is needed to find a way to replicate virtualized application in a fault tolerant way while avoiding record, transfer and replication of each system clock access request in the virtualized application because it is too costly.
The patent application WO2006/079623 ‘Method for counting instructions for logging and replay of a deterministic sequence of events’ assigned to International Business Machines Corporation, deals with a ‘record and replay’ virtualization of applications which is a pure software implementable virtualization solution. For fault tolerance purpose a backup machine needs to be always maintained in the same status than the operational machine. The application code executing on the operational machine can be replayed by reexecuting the application code in the backup machine. However, this is theoretical because the events (interruption signal, system calls etc. . . . ) occurring during code execution cannot be ‘reexecuted’ on the replication machine. The occurrence of an event needs immediate recording, transfer to the backup machine of the event data and replay of the event on the backup machine. To avoid this costly record transfer and replay steps at each occurrence of a deterministic event which can be identified by its point in the instruction execution flow, the patent application suggests identifying the event occurrence by the number of instructions already executed by the application code. In this way, during the reexecution of the application code in the backup machine, the occurrence of each event can be reproduced by counting the number of instructions already executed in the application code. In the replication machine, an overflow of the user instruction count (provided by the processor as described hereunder) is initialized beforehand in order to determine a number of instructions having to be executed from the start of the replay period and the overflow of which causes an interruption of the replay task. It is sufficient to maintain in each machine a synchronized counter of user instructions executed in the code. To this effect, the performance monitor unit provides the user instruction count (uic) that is the number of instructions executed in the user space without counting the instructions executed in the kernel space. The cited patent application proposes a way of counting exactly the uic for each task in the operational machine and to maintain synchronization of the uic in the backup machine. This implies an exact replication of the application in the backup machine by simple reexecution of the application code including the deterministic events for which the exact point of execution is pointed out by the uic value. More precisely, the uic is maintained in synchronization in the PMC of the operational and the replication machines. The uic is reset to zero at each occurrence of an event, thus avoiding uic counter overflows. Thus, to replay the virtualized application, the backup machine reexecutes the code of the application, including deterministic events, until the occurrence of a non deterministic event. A non deterministic event are replayed from the log.
System clock accesses are non deterministic events because the values returned by the system clock are different each time the system is accessed. Thus, the solution of the cited patent application does not allow replication of non deterministic events such as system clock access requests by simple reexecution of the application code in the backup machine that is without replaying the event from the log.
There is thus still a need for providing a solution for replicating accesses to system clock in a way usable for fault tolerant system configurations.