In the context of this functioning management, it is often useful to log the functioning of the primary application or one of its processes, i.e. to record the data representing this functioning, enabling the execution to be reconstituted. Along with the primary application execution, this data is then generated in the form of logging data and is transmitted to one or more secondary nodes for storage and backup.
For example in order to trace and study the functioning of the primary application in detail, it is then possible to study or to reconstitute this functioning, later on or remotely, in a controlled and monitored manner.
Also as an example, if the primary application experiences a failure, in particular a hardware failure, it is then possible to create a new standby application on a secondary node in order to replace the services provided by the primary application. This standby application can then be created in a known state, for example a restart point state recorded previously. From the logging data of the primary application, it is then possible to force the standby application to reconstitute the execution of the primary application up to the time of the failure. After this reconstitution, or replay, the standby application is in the same state as the application until the last event, the logging data of which have been received outside the primary node. If all the events preceding the failure have been logged and transmitted up to the failure, the standby application can then take over with little or no interruption of the service for the users.
However currently, many existing applications do not have such management functionalities, and it would be too complex and costly to modify them in order to add these to them.
The solution which consists of implementing these functionalities in the system software of the computer or of the primary node presents some considerable drawbacks, such as the risk of errors, instability or incompatibility within the network and the requirement for special skills in the field of systems software.
In addition a solution is proposed by the authors of this invention, which consists of these management functionalities being taken over by an intermediate application which is mainly executed in the user memory space and requires only a few modifications within the system software itself.
However, in this type of solution, inter alia, the creation and processing of logging data, as well as its the transmission from the primary node to a secondary node represents a significant calculation load with respect to the execution of the primary application itself, as well as for the communication networks used. In the prior art, the master application then experiences such a loss of performance that, often, this functioning management cannot be satisfactorily used in exploitation conditions.
In fact, in order to be able to represent in a coherent manner, or even a complete manner, the running of the primary application, the events to be recorded and to be transmitted are often very numerous. Moreover, the majority of these events correspond to operations the execution of which is very fast, in particular the events which are internal to the hardware or software resources of the primary node, for example a calling system requesting the assignment of a semaphore or reading an item of data in memory.
By contrast, for each of these events, the generation and storage, as well as the transmission of the logging data is a much longer operation, in particular for the internal events.
In fact, logging each event is in itself a process which requires at least one and frequently several software operations, each of which constitutes a load and a working time at least equal to the logged event in itself. According to the implementations and the type of internal event, the logging adds for each event a load or working time larger by a factor which commonly ranges between 100 and 10,000.
Furthermore, the hardware and software protocols used for transmission to the outside of a computer have performances which are in general poor in relation to the number of events logged, which is also a disturbance to the use of the network as well as a bottleneck for the performances of the master application. In particular while running the master application, the logging operations represent a work load for the operational node, and can be the cause of a fall-off in performance due to the action of the intermediate application.