1. Field of the Invention
The present invention is in the field of process monitoring and reestablishment (restarting) in the event that a process termination has occurred.
2. Description of the Prior Art
As used herein “process” means any process and event that—from an information technology viewpoint—is executed on a computer, a computer network and/or on associated peripheral apparatuses or other connected apparatuses. An application can thus trigger a number of processes. The user typically does not take notice of the invocation or the execution of the underlying processes. For example, if a text processing application is executed, it is thus possible for this application to invoke a process that concerns the printer driver and printer controller and executes processes on the CPU, and possibly even further processes that control access to network components.
In a complex application environment, there are normally a number of applications that are possibly functionally attached to one another (for example via calls) and thus exhibit interdependencies (in terms of process technology). If one such process must be terminated, the following problem issues can arise:                Consecutive faults (inherited errors; subsequent errors) can occur upon a cancellation (abort) of a process when, due to the dependency between the individual processes, not only is the restart of the one failed process necessary, but rather a restart strategy for multiple processes is necessary. This is the case when the failure of the one process results in failures of other processes.        Due to the interdependencies, it is often necessary to reestablish or to restart the respective processes in a predetermined order. This order of the restart of the respective processes cannot be resolved without a reestablishment strategy on a superordinate level that takes into account dependencies among the processes.        
A further problem with reestablishment processes is that, although in principle all processes should be monitored for failures, there are exceptional situations in which individual or multiple selected processes should be excluded from the monitoring and reestablishment. This can particularly be the case when a process undergoes maintenance. Moreover, it is possible that, after the first-time failure of the process, a repeated restarting of this process has been attempted, but without success. The restart process is then terminated after a specific number of attempts and the presence of a systematic error is assumed that must first be remedied by the use of further recovery measures. In this case it is reasonable to except this process not from the monitoring, but rather at least from the reestablishment. In known systems in the prior art, a large source of errors occurs when a (normally manually) excepted process also is excepted from the monitoring process. It was previously necessary that, after the completion of the maintenance process, such a (deactivated) process would have to be specifically introduced again into the monitoring via an active step by the system administrator. If this active re-insertion of a manually-excepted process is forgotten, a security gap exists and the process remains unmonitored. This can lead to severe consecutive faults.
Central monitoring of computer-based units or computers in a network via the use of agents or log files is known in the prior art.
JP10326208 discloses an error correction system in which a number of components are likewise monitored. An error analysis and a corresponding error correction are executed for each component.
EP920155 discloses a monitoring system for a number of computer agents that are connected to a central station via a network. The individual agents communicate errors to the central station. It is possible for the central station also to analyze errors that, for example, concern network errors or communication errors. The status of each number of agents is additionally processed.