1. Field of the Invention
The present invention relates to an information processing apparatus and to a method and computer program for controlling this apparatus.
2. Description of the Related Art
In an embedded device that requires high functionality and high-speed processing, the system takes on enormous size if all of these functions are driven by the same CPU. This can influence maintainability and performance. Meanwhile, the integration involved in LSI is proceeding and it has become easy to mount multiple chips on one chip. In the development of embedded devices, an effective system development method includes driving a plurality of subsystems on a chip equipped with a plurality of CPUs and having each of the subsystems execute, on a per-function basis, the processing required by the system. In a case where a failure has occurred in such a system equipped with a plurality of subsystems, detecting the failure is said to be difficult. The reason is that subsystems driven by separate CPUs are such that even if one subsystem halts, another subsystem is capable of continuing operation and therefore, even though some subsystems shut down, there are instances where a failure in the overall system cannot be discovered. Accordingly, a method of detecting failure based upon whether or not there is a command response between subsystems is in wide use as a subsystem failure detection method.
For example, according to a technique described in the specification of Japanese Patent Laid-Open No. 5-181760, one subsystem issues a command to a separate subsystem and, if a time-out occurs before the command is returned, it is determined that the subsystem has halted. According to the specification of U.S. Pat. No. 4,453,210 (Japanese Patent Laid-Open No. 55-138149), subsystem failure is sensed using a shared memory. A counter is disposed on a common memory shared by subsystems, updating of the counter by one subsystem is monitored by a separate subsystem and it is determined that a subsystem has halted in a case where the counter is not updated.
In general, failures that occur in a subsystem are device failure, which is ascribable to a fault in a device such as a CPU or memory, and system operation shutdown, such as deadlock ascribable to a software bug within the subsystem. In a case where device failure has occurred, a measure such as giving notification to the user is required. In a case where system operation shutdown has occurred, on the other hand, recovery is possible by software recovery processing. This means that it would be desirable if the kind of failure that has occurred in one subsystem could be detected in another subsystem.
With the method of detecting failure using a command, however, it is not possible to detect the fact that some software has caused deadlock and resulted in halt in a subsystem. The reason is that command processing is a combination of interrupt processing and task processing and it is not possible to discriminate in which layer a command failure is located. Further, there are instances where even in a case where some tasks have halted owing to deadlock or the like, the task relating to command processing is executed preferentially and the command still succeeds. Further, with the failure detection method using an existing counter, a single counter is used and overall subsystem failure detection is carried out by confirming updating of the counter. With this method, there are instances where it is not possible to detect the fact that some software has caused deadlock and resulted in halt in a subsystem. The reason is that in a case where the task that updates the counter is being executed at a priority higher than that of the task in which deadlock occurred, count processing is executed normally independently of the task that gave rise to deadlock.
Accordingly, the present invention provides a technique for accurately detecting failure that has occurred in a subsystem.