The present invention relates to methods and apparatuses for controlling the operation of a digital processing system, and more particularly, to methods and apparatuses for automatically controlling the operation of the system in response to certain fault conditions.
Modern digital processing systems, such as computer systems, can often operate autonomously without any human user interaction or intervention. For example, modern web servers and other servers such as file servers can perform numerous operations without any intervention by a local operator for that computer system. It will be appreciated that client systems which log on to and use the server are user interactions with the server; however, these are not user interactions by the local operator of the computer system. Client systems which logon to a server make requests to the server through a communication interface, such as a network adapter or modem or other device. The local operator normally controls the computer system by using a keyboard or pointing device such as a mouse to control locally the computer system.
While modern digital processing systems can perform many operations without intervention by a local operator, it is often necessary to intervene in the operation of the computer to rectify a fault condition. When the operator is locally present, it is an easy operation for the operator to restart the computer system or otherwise deal with the fault condition. However, when the operator is remotely located relative to the computer system, this intervention by a user typically requires traveling to the site where the computer system is operating. This travel is at least an inconvenience. One prior approach for solving this problem is described in U.S. Pat. No. 5,347,167 by Amar Singh of Sophisticated Circuits, Inc. of Bothell, Wash. This patent describes a technique for remotely restarting a computer system once it has been determined by a human user that the computer system is crashed or otherwise needs to be restarted. The remotely located human user makes a telephone call to a control device co-located with the computer system which needs to be restarted, and through this telephone call, a control device causes the entire computer system to be restarted.
While this approach alleviates traveling to the computer system which has failed, it still requires that a human user detect the failed computer system by remotely monitoring the operation of the computer system.
Thus, it is desirable to provide an improved method for controlling the operation of a computer system.
The present invention describes methods and apparatuses for controlling the operation of a digital processing system. In one example of a method of the invention, a request is repeatedly generated for the digital processing system, and a response to the request is normally provided by the digital processing system when it is not in a fault state (e.g. when not crashed). If the digital processing system is in a fault state then no response is provided, and a control device automatically restarts the digital processing system.
In another example of a method of the invention, a first software program, which is being executed on a digital processing system, provides a first status indicator to a control device which is coupled to the digital processing system. Typically, the status indicator is periodically and repeatedly provided to the control device when the digital processing system is not in a fault state. When the status indicator is not provided to the control device, a timer in the control device causes the control device to restart the digital processing system. The status indicator, when received by the control device, normally resets the timer so that the digital processing system is not restarted. The first software program may receive a second status indicator from a second software program (which may be considered to be a server application in certain examples); receipt of this second status indicator indicates that the second software program is not in a fault state. If the second status indicator is not received within a period of time, the first software program has a timer which expires after the period of time, and the first software program may specify a fault condition to the control device (e.g. by not resetting a timer in the control device). The specification of the fault condition will typically cause the control device to restart the digital processing system or to cause some other actions which have been selected by a user to respond to the fault condition.