The present invention relates to computer operating systems, and more particularly to the resetting of the operating systems.
Computer operating systems are well known in the art. Occasionally, an operating system (OS) on a computer will encounter errors, either in hardware or software, from which the OS cannot recover. The only solution is for the OS to halt operation, i.e., enter a xe2x80x9changxe2x80x9d condition, and for the OS to be reset.
Several conventional methods currently exist in the art for resetting the OS. One conventional method requires human intervention. When the OS is halted, the person using the system takes action to reset the OS. For larger computer systems, such as those comprising a server, the requirement for human intervention is removed by a service processor separate from the system""s processors. The service processor can sit and xe2x80x9cwatchxe2x80x9d the activity of a server and determine if the OS has halted. If so, the service processor automates the resetting of the OS without human intervention. However, the service processor method is costly due to the additional hardware logic required for implementation. A service processor need to be installed or embedded in the computer system. In a high volume system, where cost is a major factor in the design of the system, this is not a practical option.
Another conventional method is through a xe2x80x9cPingxe2x80x9d type protocol over a Local Area Network (LAN). A management console somewhere within the LAN periodically looks for a managed computer on the LAN. If the console does not receive a response from the managed computer, the console assumes the OS of the manage computer is halted and will issue a system restart via the Wake on LAN/Alert On LAN technology, developed by INTERNATIONAL BUSINSS MACHINES CORPORATION. However, this solution is also costly since additional hardware is required for implementation of the management console.
Accordingly, what is needed is an improved method and system for initiating and indicating a computer reset after an operating system hang condition. The method and system should automate the resetting of an OS when in a hang condition and also be cost efficient to implement. The present invention addresses such a need.
The present invention provides a method and system for providing a reset after an operating system (OS) hang condition in a computer system, the computer system including an interrupt handler not accessible by the OS. The method includes determining if an interrupt has been generated by a watchdog timer; monitoring for an OS hang condition by the interrupt handler if the interrupt has been generated and after it is known that the OS is operating; and resetting the OS if a device driver within the OS has not set a bit in a register, the bit for indicating that the OS is operating. The method and system in accordance with the present invention uses existing hardware and software within a computer system to reset the OS. The present invention uses a method by which a critical hardware watchdog periodically wakes a critical interrupt handler of the computer system. The critical interrupt handler determines if the OS is in a hang condition by polling a share hardware register that a device driver, running under the OS, will set periodically. If the critical interrupt handler does not see that the device driver has set the register bit, it will assume the OS has hung and will reset the system. In addition, the critical interrupt handler will store the reset in non-volatile memory. The reset can be logged into the system error log. Because the method and system in accordance with the present invention uses existing hardware and software within the computer system, instead of requiring an additional processor, it is cost efficient to implement while also providing a reset of the OS without human intervention.