The present invention relates generally to the automatic detection and remedy of software lock-up conditions of a computer system without human intervention, and more specifically, to the use of watchdog timers to initiate the automatic detection and remedy.
Many computer systems incorporate watchdog timers to recover from lock-up conditions. Some microprocessors are programmed with this capability. A watchdog timer basically works by being restarted often enough such that it does not expire unless there is either a lock-up condition in the code or the code did not restart the timer within the watchdog timer duration. Watchdog timers are fairly simple to use, but require the programmer to xe2x80x9chitxe2x80x9d the watchdog timer during routines which can take a significant amount of time.
Accordingly, there is a need in the art for a watchdog timer that automatically detects conditions leading to a lock-up condition of the computer system and remedies such conditions, thereby preventing a computer system lock-up without human intervention.
It is, therefore, an object of the present invention to provide a method and apparatus for automatically detecting conditions leading to a lock-up condition of the computer system and remedying such conditions to prevent a computer system lock-up.
It is another object of the present invention to provide a method and apparatus for automatically detecting conditions leading to a lock-up condition of the computer system which operates independently from the operating system of the computer system.
The purpose of Software Sanity Monitor according to the present invention is to automatically detect and remedy software lock-up conditions without user intervention. Users often refer to these conditions as xe2x80x9changsxe2x80x9d or xe2x80x9cforever loopsxe2x80x9d. Although the Software Sanity Monitor uses the operating software""s information, it is designed to execute independent of the operating system software; thereby, eliminating reliance on a xe2x80x9csanexe2x80x9d operating system. If a xe2x80x9changxe2x80x9d condition is detected, the Software Sanity Monitor will automatically restart the system after logging the failure and, optionally, notify the user or host system.
The Software Sanity Monitor is designed for, but not limited to, devices not having console input. The Software Sanity Monitor is designed to run in an operating environment where programs vary in run-time priority. In addition, the Software Sanity Monitor is designed to run in operating environments where any proportion of the programs may have the same run-time priority. The Software Sanity Monitor design does not apply to environments that are solely xe2x80x9ctime-slicedxe2x80x9d. Although the Software Sanity Monitor is designed to detect whether or not the system software is running properly, it does not determine whether or not any particular program is producing proper results.
These and other objects of the present invention are achieved by a computer-implemented method of preventing a computer system lock-up including starting a first timer. A second timer is monitored and it is determined when the second timer periodic time interval elapses. Operating software scheduling information of the computer system is sampled to verify lower priority programs have continued to run. If the lower priority programs have continued to run, the first timer is restarted such that the first timer does not interrupt the computer system. The second timer is restarted. If lower priority programs have not continued to run, the first timer is allowed to expire and interrupt the computer system. Control of the computer system is then taken by a monitoring program.
The foregoing and other objects of the present invention are achieved by an article including a computer readable medium having stored thereon a plurality of sequences of instructions, said plurality of sequences of instructions including sequences of instructions which, when executed by a processor, cause said processor to perform the steps of starting a first timer. A second timer is monitored and it is determined when the second timer periodic time interval elapses. Operating software scheduling information of the computer system is sampled to verify lower priority programs have continued to run. If the lower priority programs have continued to run, the first timer is restarted such that the first timer does not interrupt the computer system. The second timer is restarted. If the lower priority have not continued to run, the first timer is allowed to expire and interrupt the computer system. Control of the computer system is taken by a monitoring program.
The foregoing and other objects of the present invention are achieved by a computer architecture including starting means for starting a first timer. Monitoring means monitor a second timer and it is determined when the second timer periodic time interval elapses. Sampling means sample Operating software scheduling information of the computer system to verify lower priority programs have continued to run. If the lower priority programs have continued to run, the first timer is restarted such that the first timer does not interrupt the computer system. The second timer is restarted. If the lower priority programs have not continued to run, the first timer is allowed to expire and interrupt the computer system. Control of the computer system is taken by a monitoring program.
The foregoing and other objects of the present invention are achieved by a computer system including a processor and a memory coupled to the processor, the memory having stored therein sequences of instructions, which, when executed by the processor, causes the processor to perform the steps of starting a first timer. A second timer is monitored and it is determined when the second timer periodic time interval elapses. Operating software scheduling information of the computer system is sampled to verify lower priority programs have continued to run. If the lower priority programs have continued to run, the first timer is restarted such that the first timer does not interrupt the computer system. The second timer is restarted. If the lower priority have not continued to run, the first timer is allowed to expire and interrupt the computer system. Control of the computer system is taken by a monitoring program.
Still other objects and advantages of the present invention will become readily apparent to those skilled in the art from the following detailed description, wherein the preferred embodiments of the invention are shown and described, simply by way of illustration of the best mode contemplated of carrying out the invention. As will be realized, the invention is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the invention. Accordingly, the drawings and description thereof are to be regarded as illustrative in nature, and not as restrictive.