The present invention relates generally to a method of rebooting an operating system in a computer or a computer system. More particularly, the invention is concerned with a method of rebooting an operating system when it halts or stops due to occurrence of software failure.
In some operating system, device driver modules designed for controlling hardware connected to a computer are provided separately from the kernel. The device driver module called for by the kernel is used by loading it from a secondary storage such as a magnetic disk storage or the like. In conjunction with the operating system imparted with the function mentioned above, it is known to enable the computer to control a timing-critical system (i.e., system imposed with severe restriction with regard to the timing) by making use of a real-time processing device driver which is designed to snatch a clock interrupt from the operating system. To say in another way, the clock interrupt which is intrinsically to be accepted by the operating system is snatched by the real-time processing device driver for thereby executing the relevant real-time processing with priority over the processing of the operating system. After the execution, control is transferred back to the operating system.
As a concrete example of such system or scheme, there may be mentioned one described in an article entitled "The RTX Real-Time Subsystem for windows NT": USENIX Windows NT Workshop, Aug. 11-13, 1997, pp. 33-37. According to this known scheme, interrupt issued to an operating system by a device destined for the real-time control is snatched by modifying some module of the operating system and by using an especial device driver, whereon the processing for the interrupt as snatched is executed by a program which is irrelevant to the operating system. By virtue of such arrangement, the interrupt can be processed independent of the operating system, whereby the real-time performance of the computer is enhanced.
In the real-time processing, it is equally important to ensure high reliability. In the case of the known system mentioned above, the real-time processing module is designed to be independent of the kernel of the operating system. Accordingly, the real-time processing can be executed continuously even when the operating system halts or stops due to occurrence of some software failure. Furthermore, when the operating system stops due to the software failure, this fact is notified to the real-time processing module. Thus, the real-time processing module can execute a processing for coping with the stoppage of the operating system. In the known system disclosed in the above literature, the processing for the interrupt issued by the device destined for the real-time control is so controlled that execution of the interrupt processing can be continued regardless of stoppage of the operating system due to occurrence of failure.
However, in the conventional systems known heretofore, inclusive of the system mentioned above, the real-time processing device driver is caused to stop when the operating system having stopped due to occurrence of software failure is rebooted. In other words, the conventional system suffers a problem that the processing for rebooting the operating system can not be executed simultaneously with the real-time processing. This can be explained by the fact that upon rebooting of the operating system, the relevant processor is reset, whereby the data required for the virtual address translator and the data for the interrupt processing will be lost. The problem mentioned just above is more serious in the system incorporating hardware which has to be controlled periodically at a very short time interval without being stopped, because the control of such hardware will be suspended by the operating system rebooting operation. In this conjunction, it should also be mentioned that in the conventional systems, the operating system can not accept not only the clock interrupt but also the external interrupt issued by the hardware so long as the operating system is being rebooted.
By way of example, let's suppose a computer system of cluster configuration including a plurality of computers. In such system, one of the computers issues inquiry for operation of other computers periodically at a predetermined interval. In case no response is issued from any one of the computers over a predetermined time span or period, it is then decided that the computer issuing no response stops, whereon a processing for modifying or altering the system configuration is executed. In that case, the decision that the computer is not operating can be made only after lapse of a predetermined waiting time. In this conjunction, it will be noted unless the interrupt processing can be executed during rebooting of the operating system, a longer time will have to be set as the waiting time mentioned above. Thus, lots of time is taken for starting the reconfiguration of the computer system, giving rise to a problem. As can readily be appreciated, if the external interrupt for the inquiry such as mentioned above can be accepted with the response being sent back even in the course of rebooting the operating system, the time mentioned above can be reduced, which in turn means that the time taken for starting the reconfiguration of the computer system can be shortened.
It should additionally be mentioned that the rebooting method known heretofore requires a lot of time for allowing the operating system to start operation thereof because such processings as memory check, verification of the hardware configuration and others have to be executed, incurring a problem as well.