The present invention relates to a composite computer system for making access through exclusive control of common resources with a plurality of processors operating independently. More particularly, the present invention relates to a technique which can effectively be applied to a composite computer system which executes alternative processing by quickly finding a fault generated at the time of making access through exclusive control of common resources with a plurality of processors.
In an existing load distribution/incorporation type composite computer system where a plurality of processors connected with each other use in common resources such as magnetic disk units or magnetic tape units, incorporation between a plurality of processors has been established by connecting input/output units such as an interchannel coupling unit for making communication between a plurality of processors to realize mutual communication with input/output instructions.
However, in such an existing composite computer system, if interconnection with a processor of a distant station is disabled due to a fault in channel or in a communication path or a fault such as system-down, exclusive processing of resources used in common can no longer be continued.
Therefore, if no-response of the processor in the distant station is detected, a message indicating detection of no-response processor is issued to an operator. Thereby fault location has been made depending on judgment of a person and jobs have also been continued by the processings corresponding to the fault generated.
The response procedures for fault detection in the existing composite computer system is described as "Operator Procedures for MSCF Fault" in the manual entitled "System Operation of Program Product VOS3/AS - JSS3 -", published on December 1994 by Hitachi Limited.
Moreover, in view of saving communication overhead between a plurality of processors in the existing composite computer system, a memory for exclusive control for management of the common resources is provided for effective incorporation between a plurality of processors. For example, lock information for exclusive control is arranged in a non-volatile control memory provided in unit of volume of duplicated magnetic disk units and thereby such lock information for exclusive control of the control memory is used in the disk double-writing control program.
In the disk overwriting control program, when a processor updates the lock information, a plurality of processors are incorporated with each other by making use of the function to report it as an asynchronous input/output interrupt to the other processors. However, if a processor generates system-down while having the lock information in the existing composite computer system, the lock information cannot be reserved for access to the overwriting magnetic disk unit of the other processors operating normally and thereby input/output time over occurs, disabling continuation of the processings.
Since it is impossible to judge the operating conditions of the other processors in the prior art where exclusive control of common resources is executed by making communication between a plurality of processors using input/output units such as the interchannel coupling units explained above and in the prior art where the exclusive control is executed by giving the lock information to one processor like the disk double-writing control program explained above, intervention by the operator is necessary for cancelling of lock information when a fault is generated in the processors having such lock information. Therefore, in the composite computer system of the prior art, it is necessary to previously perform a manual recovery procedure assuming a combination for occurrence of a fault. Such a procedure increases the load for operation of the composite computer system.
The existing composite computer system has a problem that a load shared by an operator such as the generation of operation procedures is increased. Alternatively, it is impossible to cover a long of time automatic operation at the time of message output because when no response of the processor in the distant party is detected, a message is issued to the operator to indicate that a processor of no-response has been detected. Thus, to continue the job the location of fault must be identified depending on judgment of a person.
Moreover, when a processor has generated a system-down while it has the lock information in the disk double-writing control program of the existing composite computer system, and since cancellation of the lock information requires an operator's intervention, the manual recovery procedure assuming a particular combination for generation of fault must be performed beforehand, resulting in a load on operation.