In the context of this functioning management, it is often useful to log the functioning of the primary application or of one of its processes, i.e. to record the data representing this functioning, enabling the running to be reconstituted. While the primary application is running, this data is then generated in the form of logging data and is transmitted to one or more secondary nodes for storage and backup.
For example, in order to trace and study the functioning of the primary application in detail, it is then possible to study or to reconstitute this operation, later on or remotely, in a controlled and monitored manner.
Also, as an example, if the primary application experiences a failure, in particular a hardware failure, it is then possible to create a new standby application on a secondary node in order to replace the services provided by the primary application. This standby application can then be created in a known state, for example a restart point state recorded previously. From the logging data of the primary application, it is then possible to force the standby application to reconstitute the execution of the primary application up to the time of the failure. After this reconstitution, or replay, the standby application is in the same state as the application until the last event, the logging data of which have been received outside the primary node. If all the events preceding the failure have indeed been logged and transmitted up to the failure, the standby application can then take over with little or no interruption of the service for the users.
However, many existing applications do not currently have such management functionalities, and it would be too complex and costly to modify them in order to add these to them.
The solution which consists of implementing these functionalities in the system software of the computer or of the primary node presents some considerable drawbacks, such as the risks of errors, instability or incompatibility within the network and the requirement for special skills in the field of systems software.
In addition, a solution is proposed by the authors of this invention, which consists of these management functionalities being taken over by an intermediate application which is executed mainly in the user memory space and requires only a few modifications within the system software itself.
However, in this type of solution, inter alia, the creation and processing of logging data, as well as its transmission from the primary node to the secondary node, represent a significant calculation load with respect to the execution of the primary application itself, as well as for the communication networks used. In the prior art, the master application then experiences such a loss of performance that, often, this functioning management cannot be used satisfactorily in exploitation conditions.
In fact, in order to be able to represent in a coherent manner, or even a complete manner, the execution of the primary application, the events to be recorded and to be transmitted are often numerous. Moreover, the majority of these events correspond to operations the execution of which is very fast, in particular the events which are internal to the hardware or software resources of the primary node, for example, a system call requesting the assignment of a semaphore or reading of an item of data in memory.
By contrast, for each of these events, the generation and storage, as well as the transmission of the logging data is a much longer operation, in particular for the internal events.
In fact, logging each event is in itself a process which requires at least one and frequently a number of software operations, each of which constitutes a load and a working time at least equal to the logged event in itself. According to the implementations and the type of internal event, the logging adds for each event an increased load or working time larger by a factor which commonly ranges from 100 to 10,000.
Furthermore, the hardware and software protocols used for transmission to the outside of a computer have performances which are in general poor in relation to the number of events logged, which is also a disturbance to the use of the network as well as a bottleneck for the performances of the master application.
Certain solutions exist which allow the number of events to be logged to be reduced, in particular by not logging events of a non-deterministic type.
An event, or the operation which constitutes it, in particular a software operation, can be qualified as deterministic if the result of its execution depends only on the initial conditions which existed at the time of this initiation. More particularly, in the context of managing a unitary operation or an execution or a functioning as described here, an operation is termed deterministic if it is deterministic from the point of view of the process which initiated it, i.e. if the result which it sends to this process depends only on the initial state of this process. Similarly, a contiguous succession of deterministic operations may constitute itself a deterministic sequence.
Within the running of an application process, many of the operations performed are deterministic, in particular among internal operations. For example, mathematical or logical type internal operations will be deterministic more often than not if they affect only those resources forming part of the initial state of this process, and which it alone can modify.
A contrario, some operations applying to shared resources are often non-deterministic vis-à-vis such a process. For example, a request for assignment of a shared semaphore or a “lock” covering a memory zone shared with other processes could be non-deterministic. In fact, the result, i.e. the obtaining or otherwise of this lock or this attribution might depend on the state or actions of other processes, which sometimes would have reserved or not reserved this resource.
However, the replay and, in particular, the logging of non-deterministic events still constitute a loss of performance which could usefully be reduced. In particular while running the master application, the logging operations represent a work load for the operational node, and can be the cause of a fall-off in performance due to the action of the intermediate application.