1. Statement of the Technical Field
The present invention relates to the field of error logging, and more particularly to autonomic application error detection, diagnosis and recovery.
2. Description of the Related Art
Error logging dates from the earliest of computing applications. Error logging, in the context of systems administration, typically involved the monitoring of system state and the continuous writing of log entries to a file, each log entry reflecting an error condition detected within the system. The use of an error log particularly had been necessitated by the complexity of modern computing systems and the speed at which multiple concurrent sub-systems and tasks interact with one another in the system. The system administrator, through inspection of the log entries in the log could diagnose system faults which otherwise would not be apparent by mere observation of the operation of the system.
Traditionally, error logs had been automated only to the extent that log entries could be written to the log automatically as error conditions were detected within the system. The process of reacting to logged error conditions remained manual and human-centric in nature. In many cases, though, the complexity of the system becomes such that a manual review of an error log often can be ineffective in diagnosing the root cause of a fault within the computing system. In any case, as computing matured to include a distributed computing model, the focus of error logging shifted from mere monitoring of conditions within low-level components to conditions surrounding the execution of ordinary computer programs. Consequently, much of the recent research and development arising in the context of error logging pertains to interoperable logging services such as the Java Commons Logging sub-project. From the interoperability perspective, advances reflected with the Java Commons Logging sub-project include a common error logging interface, common error log formats and standardized naming representations for resources.
It will be recognized by the skilled artisan that error logging can be viewed only as a portion of the solution to error processing and management. Specifically, while it can be helpful to automatically log error conditions across multiple applications and system components, the process of reviewing the error log typically occurs only subsequent to an error condition after a period during which the operation of computing system may have failed in its entirety. Conventional error logging facilities fail to undertake remedial measures in response to an error condition logged by the error logging facility. Yet, so many error conditions are not unrecoverable in the sense that many error conditions arise through states which easily can be overcome. Examples include inappropriate user input, insufficient resources, non-responsive or unsupported software, and the like.
Whereas error logging in general can suffice for computing systems geared towards human intervention, the same cannot be said of error logging in the context of autonomic computing. For the uninitiated, autonomic computing systems self-regulate, self-repair and respond to changing conditions, without requiring any conscious effort on the part of the computing system operator. To that end, the computing system itself can bear the responsibility of coping with its own complexity. The crux of autonomic computing relates to eight principal characteristics:    I. The system must “know itself” and include those system components which also possess a system identify.    II. The system must be able to configure and reconfigure itself under varying and unpredictable conditions.    III. The system must never settle for the status quo and the system must always look for ways to optimize its workings.    IV. The system must be self-healing and capable of recovering from routine and extraordinary events that might cause some of its parts to malfunction.    V. The system must be an expert in self-protection.    VI. The system must know its environment and the context surrounding its activity, and act accordingly.    VII. The system must adhere to open standards.    VIII. The system must anticipate the optimized resources needed while keeping its complexity hidden from the user.
In keeping with the principles of autonomic computing, an error logging facility must not only account for the automatic logging of error conditions across an entire system of application components and supporting resources, but also the impact of any one of the logged error conditions must be considered upon the entire computing system. Specifically, it will be of paramount concern to the autonomic system that error conditions which are recoverable are processed as such. Thus, in an autonomic system it is no longer reasonable to log error conditions in the system without regard to autonomic recovery.