1. Field of the Invention
The present invention relates generally to data processors which provide business information services such as Web and internet sales services, and more specifically to a fault recovery system and method for restoring a data processor from failure by issuing a fault restoration command to the data processor according to the type of trouble.
2. Description of the Related Art
Proliferation of business information services using a communications network has been accelerated due to their inherent advantages of speed and efficiency with the combined ability to meet individual needs of clients. With the rapidly increasing range of applications, computers that provide information services to many users must be fault-tolerant and the issue of fault tolerance is becoming increasingly important.
In fault recovery systems generally known in the data processing art, the operating state of a server is constantly monitored and compared with a predetermined set of symptoms. When the detected operating state corresponds to one of the predetermined symptoms, it is determined that the server has failed and a corresponding fault restoration command is automatically executed on the server, so that the server is completely restored from failure or prevented from becoming faulty. However, the executed fault restoration command is also one of the commands that are predetermined and cannot adaptively be altered. Since the operating performance of servers vary with time and individual configurations, the prior art fault recovery system cannot adapt to the varying condition of the server. Although this problem could be solved manually, the maintenance cost would be substantial due to the needs for frequent alterations of reference settings and corresponding fault restoration commands.
Japanese Patent Publication 1995-54474-B2 discloses a fault recovery system in which a set of different fault restoration commands is provided for each detected operating condition of a server. In this system, the commands of each set are sequentially executed according to a predetermined order of priority. Although the execution of different commands may be effective for solving some type of troubles, the order of command executions is unalterably fixed and hence the commands executed in the early stage of restoration are not necessarily optimal for a particular problem. It is likely that a repeated cycle of futile command executions can occur with a result that a long time is taken to restore the server from failure. Additionally, the repeated futile command executions could possibly trigger other troubles. Therefore, limitations would be imposed on the use of some commands that are likely to trigger other troubles or their priority would be manually altered to map the commands to specific service configurations. This would result in a narrow range of usable commands and the maintenance cost of the server will grow in proportion to the increasing complexity of the system.
Japanese Patent Publication 2002-251295-A discloses a fault recovery system that includes a knowledge assistance tool, which was used in the past for maintenance personnel. The assistance tool is a collection of past records and provides mapping of troubles to fault recovery commands. The fault recovery system assigns priority levels to the recovery commands and modifies the priority levels according to the degree of similarity between past operating state and currently detected state. However, since manual input is required for mapping past troubles to commands and hence the effectiveness of a command execution largely depends on the competence of an operator, it is uncertain that a command selected according to the assistance tool actually succeeds in shooting the trouble.
Therefore, there exists a need to provide a fault recovery system and method which eliminates the shortcomings of the prior art fault recovery systems.