In order to analyse or reliabilize the functioning of such an application, or to make it more flexible or improve its performance, the use of methods of recording events occurring in this application is known, in order to be able to replay them, i.e. re-execute them or cause them to be produced identically, at another time or on another computer. However, current methods of recording as events occur are very worktime consuming and tend to slow down an application too heavily when in normal use.
In addition, if an application used in exploitation has not been designed from the start to produce such a record, it is difficult and costly to add such functions to it later, and this constitutes a significant risk of errors.
Some methods are also used by tuning or “debugging” programs, which allowing monitoring of the functioning of an application from outside. However, more often than not, these methods act within the computer system which executes the application, for example by changing or adding new kernel modules in the system. However, these system changes require specific system skills, and can induce heterogeneities between a number of network computers, which can be a source of errors and instabilities. More often than not, these disadvantages greatly limit the use of the record and replay principle, in particular for tuning tasks or isolated configurations, and are unacceptable for configurations which are both extensive and stressed in actual exploitation.
A method of recording and replay (“record/replay”) is described, for example, in the 2002 article entitled “Debugging shared memory parallel programs using record/replay” by Messrs. Ronsse, Christiaens and De Bosschere in the Belgian review Elsevier B. V. This article describes the use of a method for tracing the functioning of a multi-process application with the aim of debugging it. To reduce the fall-off in performance due to event recording, the article proposes to use intrusive methods to detect certain situations which are sources of uncertainty in the relative running of independent events affecting a single shared resource (“race conditions”), and to limit recording to these situations.
However, this solution remains limited to debugging applications, more often than not outside networks in operation, and uses intrusive methods which can be complex to implement, constitute risks of error, and can largely depend on the constitution of the application to be traced. In particular while running the master application, the logging operations represent a work load for the operational node, and can be the cause of a fall-off in performance due to the action of the intermediate application.