In the context of this functioning management, it is often useful to log the functioning of the primary application or one of its processes, i.e. to record data representing this functioning, and which enables reconstitution of the execution. During the running of the primary application, this data is then generated in the form of logging data and is transmitted to one or more secondary nodes for storage and backup.
For example in order to trace and study the functioning of the primary application in detail, it is then possible to study or to reconstitute this functioning, later on or at a distance, in a controlled and monitored manner.
Also as an example, if the primary application experiences a failure, in particular a hardware failure, it is then possible to create a new standby application on a secondary node in order to replace the services provided by the primary application. This standby application may then be created in a known state, for example a restart point state recorded previously. From the logging data of the primary application, it is then possible to force the standby application to reconstitute the execution of the primary application up to the time of the failure. After this reconstitution, or replay, the standby application is in the same state as the logged application up to the last event, the logging data of which have been received outside the primary node. If all the events preceding the failure have been logged and transmitted up to the failure, the backup application may then take over with little or no interruption of the service for the users.
However currently, many existing applications do not have such management functionalities, and it would be too complex and costly to modify them in order to add these to them.
The solution which consists of implementing these functionalities in the system software of the computer or of the primary node presents some considerable drawbacks, such as the risk of errors, instability or incompatibility within the network and the requirement for special skills in the field of systems software.
In addition a solution is proposed by the authors of this invention, which consists of these management functionalities being taken over by an intermediate application which is mainly executed in the user memory space and requires only a few modifications within the system software itself.
However, in this type of solution, inter alia, the transmission of the logging data from the primary node to a secondary node represents a significant calculation load with respect to the execution of the primary application itself, as well as for the communication networks used. In the prior art, the master application then experiences such a loss of performance that, often, this functioning management cannot be satisfactorily used in exploitation conditions.
In fact, in order to be able to represent in a coherent manner, or even a complete manner, the execution of the primary application, the events to be recorded and to be transmitted are often very numerous. Moreover, the majority of these events correspond to operations the execution of which is very fast, in particular the events which are internal to the hardware or software resources of the primary node, for example a system call requesting the assignment of a flag or reading an item of data in memory.
By contrast, for each of these events, the transmission of the logging data from the user memory space constitutes a much longer operation, in particular for the internal events.
Indeed, this data is therefore transmitted to the system software, which manages them and processes them according to a certain number of network protocols, for example a TCP protocol followed by an IP protocol, in order to then transmit them via communication means, for example a network card. And it so happens that these network protocols represent a significant calculation load with respect to the duration of an event. Moreover, in particular in an existing network, the performance of the means for communicating from one computer to the other are generally poor in relation to the number of events, because most often than not they are only designed for intermittent data transfers. As a result, the transmission of the logging data of each event on the one hand, and the corresponding event on the other hand, have execution times which can sometimes differ by a factor commonly ranging from 100 to more than 10,000. In particular while running the master application, the logging operations represent a work load for the operational node, and can be the cause of a fall-off in performance due to the action of the intermediate application.