1. Field of the Invention
The present invention relates to a method and apparatus for controlling a computer system in the case where a fault occurs in an application program. Typically, a computer system includes an operating system program which constitutes a given operating system and which is adapted to carry out the control of the operations of the computer system as a whole. Further, the computer system also includes one or a plurality of application programs which are executable by such an operating system, and which are provided separate from the operating system program.
Especially, the present invention relates to a technique for handling a fault or failure regarding application programs without lowering the operational efficiency of the computer system, even though the fault or failure occurs in a certain application program arranged in an address space different from that of any other application program.
2. Description of the Related Art
In recent years, most computer systems have system configurations in which an operating system (sometimes abbreviated to "OS"), for managing and controlling operations of the computer system as a whole, is provided, perfectly distinct (i.e., completely separate) from application programs executable by such an operating system. In such a configuration, an address space for the operating system and the address space for each of the application programs are arranged, definitely (i.e., clearly and distinctly) separate from each other. Further, a program space of the operating system and a program space for each of the application programs are also arranged, definitely separate from each other.
In such systems case, an executable operation mode carried out by the OS is usually referred to as a system mode, while an executable operation mode carried out by an application program is usually referred to as a user mode. Further, the program space of the OS is referred to as a system space or a system area.
In the above-mentioned computer system, an abnormality sometimes occurs in a certain application program, e.g., a fault due to a runaway of the application program, an illegal access operation by the faulty application program, or the like. Such a fault is usually detected through an exception interrupt issued by the application program in which an abnormality occurs, a validity check of various data and programs by the OS, or the like.
In such a technique, when the above-mentioned abnormality has been detected by the OS, a process for terminating the execution of program due to the fault is carried out for the faulty application program (hereinafter, the application program in which an abnormality occurs will be referred to as "the faulty application program").
At this time, in order to determine the cause of the above-mentioned abnormality, e.g., a fault in the faulty application program (i.e., the failing application program), some information necessary for analyzing faults is usually collected and transferred to a given file, or the like. Such information generally includes fault information obtained from a memory dump in a memory area of the program space of the faulty application program, and management information, for controlling application programs, which is provided in the system space.
In this case, it should be noted that the program space for each application program is separated from any other program space. In such a configuration of program space, access to the above-mentioned information regarding a fault of the faulty application program can be carried out only by using the system mode of an OS.
Namely, it is difficult to obtain the above-mentioned information regarding a fault, of the faulty application program, by using the other, normal application programs provided independently of the faulty application program. Further, the management information, etc., which an OS maintains in its system space, may be changed due to other operations of the computer system.
Therefore, in the prior art, the OS per se is designed to collect some information necessary for analyzing faults, in the case where a certain application program fails, so that the fault or failure of the application program can be accurately detected and analyzed.
As mentioned above, in the case where the OS per se collects the information regarding a fault or failure of the faulty application program, a process for collecting the information regarding a fault and storing the information into a given file is executed, in the system mode, by the OS, prior to any other process. Namely, the process for collecting and saving the information regarding a fault or failure of the faulty application program has priority over any other process of the remaining application programs, regardless of the priority that the faulty application program has before the fault occurs.
In such a configuration, even though some of the remaining application programs may have a higher priority than the faulty application program, the execution of the remaining application programs is delayed. Consequently, a problem occurs in that an operational efficiency of a computer system may deteriorate.
Recently, higher performance has been required for computer systems, and simultaneously the size of application programs has also increased. Therefore, the program space, which is allocated to application programs, tends to remarkably increase. Further, the amount of the information regarding a fault, which must be collected when the fault occurs in a certain application program, tends to be remarkably increased.
When there is a remarkable increase in the program space and the amount of the information regarding a fault, as described above, the time required by the OS to collect the information regarding the fault and to save such information by using an OS is also increased. With the increase of such time, the time utilized for allocating various instructions from a central processing unit (CPU) to a plurality of normal application programs is likely to gradually decrease.
Consequently, the performance of the whole computer system may fall. In other words, an increase in the time necessary for an OS to collect and save the information regarding a fault may have a serious influence, or impact, on the performance of the whole computer system.
Further, according to a technique for handling a fault in the prior art, the information regarding a fault is collected in the system mode, by the OS. Namely, a process for collecting the information regarding a fault is executed for every application program to the same degree. Therefore, extra information, other than the information necessary for individually analyzing each application program, is likely to be collected. On the other hand, important information, actually necessary for analyzing the faulty application program, may be omitted, owing to the large amount of information which is to be collected.
Further, a situation sometimes occurs in which it becomes necessary for the faulty application program to continue executing some subsequent process, even after a process for terminating the execution of program due to the fault or failure has been carried out for the faulty application program. In this case, the request for the execution of the above-mentioned subsequent process is made to the faulty application program during the time period, from the time when the faulty application program is activated again and an initialization of the program is completed, until the time when the application program again becomes executable.
Even though such a situation occurs, it is not possible for an OS to meet this requirement. Consequently, another problem occurs in that the contents of the request for the execution of the above-mentioned subsequent process are treated as an abnormality, or disregarded.