Software applications running on computer systems may experience a variety of errors that may affect its operational state. Errors which software applications may experience include, for example, errors relating to memory allocation, memory corruption, segment violation, unexpected state transitions, interprocess communication between applications, and timer related system calls. It is important for a computer system to recognize an application's operational state in order to allow for the computer system to take recovery actions and prevent the degradation of operational services.
Some software applications are capable of logging errors internally to be reviewed by the computer system user or a system manager. Other software applications are capable of generating an error report that may be transmitted outside the computer system to be reviewed by a software developer. These logs or reports typically include information about the error, such as the time it occurred and information about the nature of the error. Although these applications are capable of logging and reporting errors, no further action is typically taken during the application run time. Eventually, if the errors reach a high enough severity level, the result may be a software application failure or worse, an operating system failure. These failures may cause valuable data to be lost from the application. In the event of an operating system failure, data may also be lost from other applications and operational services may be interrupted.
Thus, what is needed is an effective method and apparatus for managing errors in a computer system to predict failures in advance and to take appropriate recovery action.