1. Field of the Invention
This invention relates to an emergency resumption processing apparatus for an information processing , system, such as an exchange system, whose hardware is multiplexed.
2. Description of the Related Art
In an information processing system such as an exchange system, each piece of system hardware is often multiplexed (generally duplexed). In such a system, operation can be continued by switching to a normal system, even if a fault arises in the system presently used.
If the system changes from an abnormal one to a normal one, the service of the entire system shuts down temporarily while it executes emergency resumption processing, because the new system can begin normal service only after a special processing for emergency resumption has been performed. Service shutdown time should be as short as possible for a highly public information processing system such as an exchange system.
FIG. 1 shows a general configuration of an exchange system, each piece of hardware of which is duplexed.
In FIG. 1, the central processing units 1--0 and 1--1 control the exchange processing of communication information. Switch circuit networks 2--0 and 2--1 switch the communication path of communication lines. Main memory devices 3--0 and 3--1 are for executing an exchange processing program and an emergency resumption processing program. These programs are loaded from an external memory device (later described) at the execution of each program. External memory devices 4--0 and 4--1 memorize the exchange processing program and the emergency resumption processing program. Channel devices 5--0 and 5--1 are interface devices for the main memory devices 3--0 and 3--1 or central processing units 1--0 and 1--1 of the external memory devices 4--0 and 4--1.
FIG. 2 shows a prior art system for switching from an abnormal system to a normal system and for having the latter execute an emergency resumption processing when an abnormality arises in the system currently in use in an exchange system whose configuration is duplexed as shown in FIG. 1. In FIG. 2, configuring elements or processing units other than the central processing units 1--0 and 1--1 of FIG. 1 are omitted. Watchdog timers 11--0 and 11--1 are counters that are counted up by a system clock, not shown in the drawing, and reset by the central processing units 1--0 and 1--1.
Now, a case is considered in which an exchange processing is being executed by a system comprising, e.g., the central processing unit 1--0, the switch circuit network 2--0, the main memory device 3--0, the external memory device 4--0 and the channel device 5--0 shown in FIG. 1. In this case, the central processing unit 1--0 outputs an ACT signal of logical "1" indicating that the system is currently in use, and also periodically resets the watchdog timer 11--0 based on the control of an exchange processing program executed by the same device, which prevents the counter value of the watchdog timer 11--0 from overflowing when the system is in use.
If, for example, an abnormality arises in the main memory device 3--0 (shown in FIG. 1) and the central processing unit 1--0 runs wild, since the central processing unit 1--0 ceases to be capable of normally performing the exchange processing program on the main memory device 3--0, periodic resetting of the watchdog timer 11--0 based on the program ceases to be performed normally. This causes the counter value of the watchdog timer 11--0 to overflow after a predetermined time period has elapsed and to output a carry signal. This carry signal is inputted to an EMA control unit 16 through an "AND" gate 12--0 (the ACT signal remains at the logical "1") and an "OR" gate 15.
Since the central processing unit 1--1 is not in use, it does not output an ACT signal of logical "1". Therefore, even if the counter value of the watchdog timer 11--1 overflows and a carry signal is output, the signal is prevented from being input from "AND" gate 12--1 to the EMA control unit 16. Upon receiving the carry signal from the watchdog timer 11--0, the EMA control unit 16 outputs a pulse signal to an EMA state counter 10 after forcibly resetting the central processing units 1--0 and 1--1. The counter value of the EMA state counter 10 is counted up by the above pulse signal from the EMA control unit 16 and outputted to the central processing units 1--0 and 1--1.
Meanwhile, the central processing units 1--0 and 1--1 have a table as shown in FIG. 3 corresponding to the above counter value. (Numbers in FIG. 3 which are the same as those in FIG. 1 indicate the same configuring elements or processing units.) The central processing unit designated by the above counter value operates in a system shown in the table of FIG. 3 corresponding to its counter value. Thus, as shown above, when a system comprising the central processing unit 1--0, the switch circuit network 2--0, the main memory device 3--0, the external memory device 4--0 and the channel device 5--0 executes an exchange processing, the counter value of the EMA state counter 10 is "00".
Then, as described above, when the watchdog timer 11--0 detects an abnormality in the main memory device 3--0 and the EMA control unit 16 outputs a count-up pulse to the EMA state counter 10, the counter value of the EMA state counter 10 changes from "00" to "01". Hence, the central processing unit 1--0 selects the main memory device 3--1 and external memory device 4--0, according to the table in FIG. 3 corresponding to the counter value "01". The above actions cause the main memory device 3--0 with an abnormality to be switched to the main memory 3--1.
Furthermore, the central processing unit 1--0 loads the program for emergency resumption processing to the switched main memory device 3--1 from the external memory device 4--0. By executing the loaded emergency resumption processing program, the exchange system resumes its service.
A case in which an abnormality arises in the main memory device 3--0 when the system comprising the central processing unit 1--0, the switch circuit network 2--0, the main memory device 3--0, the external memory device 4--0 and the channel device 5--0 executes exchange processing is explained above. When an abnormality arises in the central processing unit 1--0, for example, if he new system shown in FIG. 3 resumes the operation initiated by a change in the counter value of the EMA state counter 10 from "00" to "01", since the central processing unit 1--0 itself runs wild or away, the watchdog timer 11--0 immediately detects an abnormality. Thus, the counter value of the EMA state counter 10 further changes to "10", and the new system comprising central processing unit 1--1 resumes operation. Likewise, no matter what system has an abnormality, by having the counter value of the EMA state counter 10 sequentially change, a normal system automatically resumes operation.
An external watch unit 13 watches the operating mode of the central processing units 1--0 and 1--1 via signal interfaces 14--0 and 14--1, in case the watchdog timers 11--0 and 11--1 cannot detect a system abnormality. The external watch unit 13 outputs a control signal similar to the carry signal described earlier via the "OR" gate 15 to the EMA control unit 16, when it detects a system abnormality. As a result, the system is switched, as when a system abnormality is detected in the watchdog timers 11--0 and 11--1. In this case, the external watch unit 13 detects the system abnormality by tracing the program execution status at the central processing units 1--0 and 1--1. Thus, the detecting interval of the system abnormality requires a relatively long time (in the order of 10 minutes, for example).
In the above described prior art, a fault may arise in the function (including the function of the channel device 5--0 or 5--1) .of transmitting the emergency resumption processing program from the external memory device 4--0 or 4--1 to the main memory devices 3--0 or 3--1. If such a fault arises, the program on the main memory device 3--0 or 3--1 can begin to be executed because the central processing unit 1--0 or 1--1 erroneously receives a response indicating a completion of loading from the channel devices 5--0 or 5--1 although the emergency resumption processing program is not loaded on the main memory device 3--0 or 3--1. Of course, because this program execution is not an execution of the emergency resumption processing program, its execution causes the system to run wild. Therefore, if a fault arises in the transmission function of the emergency resumption processing program, a system abnormality is again detected, because the watchdog timer 11--0 or 11--1 ordinarily ceases to be reset at the emergency resumption processing time. Hence, the EMA control unit 16 counts up the counter value of the EMA state counter 10 by one, and requests the central processing units 1--0 or 1--1 to execute the emergency resumption processing again.
However, there is a possibility that the programmed command to reset the watchdog timer 11--0 or 11--1 remaining on the main memory device 3--0 or 3--1 is accidentally executed, because the above described program runs wild. If such a situation happens, the watchdog timer 11--0 or 11--1 cannot detect a system abnormality.
In this case, as discussed earlier, the external watch unit 13 detects the system abnormality. Yet as described earlier, since it takes a long time (about 10 minutes) for the external watch unit 13 to detect the system abnormality, the emergency resumption processing is not reactivated responsively.
Thus, when a fault arises in the transmission function of the emergency resumption processing program, the emergency resumption processing of the exchange system is not executed for a long time, causing a problem that the exchange service is disrupted for a long period.