The present invention relates to a data processing system in multiprocessing and, more particularly, to a control system for serializing reference/update operations on shared data stored in a main storage between a plurality of tightly-coupled multiprocessing units.
As is well known in the art, multiprocessing in which a plurality of processing units share a main storage is a desirable processing system considering system processing ability and reliability. In a multiprocessing situation, communication functions have to be set up between a plurality of processing units for promoting one-dimensional processing. The communications functions include one which temporarily restrains one processing unit from referencing/updating shared data stored in the main storage until the other processing unit completes referencing/updating of the data.
In multiprocessing, it is not rare that processing units use common data in a main storage in a contentious manner, typically a central processing unit (CPU) versus a channel and a CPU versus a CPU. Particularly, mechanisms heretofore proposed for serializing the operations for the CPU versus CPU situation may be classified into two types, i.e., a suspend type and a spin type as will be described.
Referring to FIG. 2 which represents the suspend type mechanism, a LOCK is an instruction having a single operand and adapted to designate a certain address of a main storage. The address is associated with a specific resource such as a shared table. The LOCK instruction unloads data from an address which its operand designates, checks a specific bit of the data and, if it is "0", changes it to "1" and stores it in the same address, followed by an instruction subsequent to the LOCK instruction. If the specific bit has already become "1", an interrupt is generated without changing the content of the designated address. In response to the interrupt, the control advances to a control or supervisor program to interrupt execution of a task where the interrupt has occurred and, thereby, sets up a wait state, whereafter the control shifts to another task. An UNLOCK which is an instruction paired with the LOCK instruction unloads data from an address designated by its operand, checks a specific bit of the data and, if it is "1", changes it to "0" and stores it again in the same address, and then generates an interrupt. Then, the control program renders the waiting task for this resource executable and, then, returns to an instruction subsequent to the UNLOCK instruction of a task where the interrupt has occurred so as to continue the execution. If the specific bit is "0", which means an error, the control program may take any suitable measure such as aborting this task by generating an interrupt. FIG. 2 shows a condition wherein a task l being executed in a processing unit 5 has owned a resource A and, accordingly, an execution is started in a second processing unit 6. In the meantime, the unit 6 performs other tasks developing no overhead.
Another method has been proposed which is elaborated to reduce the frequency of interrupts and, thereby, the overhead. This method is such that an UNLOCK instruction causes data in a specific address designated by its operand to be changed unconditionally to "0" and then stored again, generating no interrupt. The method involves the possibility of a suspend lock occurring again when execution is started, because the start of execution of a suspend lock task is treated in the same manner as ordinary dispatching of processing units to tasks.
In accordance with another method, a word designated by the operand of a LOCK instruction or that of an unlock instruction (lockword) or its particular bit (lockbit) may comprise a counter responsive to the number of waiting tasks in addition to information showing whether or not the associated resource is in use. The counter indicates an unused resource by "0", a resource by when "1", and numbers waiting tasks by "1", "2" and upward. A LOCK instruction increments the counter by "1" and, if the result is "2" or larger, generates an interrupt. An UNLOCK instruction decrements the counter by "1" and, if the result is "1" or larger, generates an interrupt. In response to the interrupt, a control program suspend-locks a task or reruns one of the waiting tasks by locking. Interrupt would be significantly suppressed if it rarely occurs that the result of subtaction by an UNLOCK instruction is "1" or larger.
Another possible approach is assigning the control to a task itself in response to a LOCK instruction and an UNLOCK instruction, without generating any interrupt. Specifically, in response to a LOCK instruction, a lock bit designated by the operand is checked and, if it is "0", it is changed to "1", then it is stored in the same bit, and then the processing is continued branching to a position fixed several addresses ahead. If it is "1", an instruction based on the LOCK instruction is executed. A series of instructions after the LOCK instruction and just before the position fixed several addresses ahead comprise those of the kind which abandon the control of the task itself and, afterwards, branch again to a LOCK instruction after obtaining a control. Concerning an UNLOCK instruction, it suffices to unconditionally change a lock bit designated by the operand to "0". It will be noted that the lockword or the lockbit may be serially controlled by making "0" lock-on and the others lock-off, and such is true in the following description as well.
The role assigned to hardware is generating a LOCK instruction and an UNLOCK instruction. The primary requirements are that each operating process be effected by one instruction, a cache in a processing unit be by-passed during operation, and no interrupt be allowed during operation.
Characteristic features of the suspend type scheme are that, in the system aspect, no limitation is imposed on the period of time for which a resource is owned, that a processing unit involves no waste of time due to waiting, and that the suspend type is applicable not only to multiprocessing but also to uniprocessing for resource management which is shared by tasks.
The other mechanism, a spin type mechanism, is shown in FIG. 3. The spin type mechanism, like the suspend type, is widely applicable for the control over serial use of a resource among tasks in multiprocessing. However, its major application is as a routine in a non-task portion for controlling tasks themselves. Where a procedure constituting a series of tasks or routines is under way in a processing unit, its locus is referred to as a "process".
A LOCK instruction has two operands which designate, respectively, certain addresses of a main storage. An address of the first operand is associated with a shared table or like specific resource. An address of the second operand designates a location to which the LOCK instrution is to jump. The LOCK instruction checks a lockword/lockbit designated by the first operand and, if it is lock-off, changes it to lock-on, and then stores it in the original position, followed by execution of the next instruction. If the lockword/lockbit is lock-on when checked, the instruction branches to an address designated by the second operand without changing anything. Reading and writing a lockword/lockbit are effectively bypassing a cache in a processing unit, so that no interrupt may occur in the course of an instructed operation. An UNLOCK instruction unconditionally changes a lockword/lockbit designated by an operand to lock-off.
In the example shown in FIG. 3, since a process l being executed in the first processing unit 5 has secured a resource A first, a process m in the second processing unit 6 is branched to an address B. The address B is an inlet point of a resource wait routine shared by all the processors and provided with a recursive program structure. Here, the control is looped for a predetermined period of time until the resource A becomes released and, immediately after the release, it is transferred to the process m. The process m holds the resource A by another LOCK instruction. If the predetermined period of time expires before the release of the resource A, an error has occurred. In the resource wait, by checking whether the resource A is being used by a LOCK instruction, it is possible to eliminate the need for re-issuing a LOCK instruction at the instant when the control is transferred to the process m.
The spin type mechanism usually involves some limitations in use because a processing unit itself waits while looping, or spins, for control reasons. In order that the spin time be finite for error checking purposes, it is necessary to inhibit interrupts while a resource is owned by a process. Usually, the duration of a continuous operation in an uninterruptable or interrupt inhibit mode of a processing unit is limited in conformity to a processing ability designed for the processing unit and, hence, the maximum spin time cannot exceed the limited time. Where a process spinning in one processing unit has already secured some resources, deadlock due to contention or interlock may occur when a process in the other processing unit desires to own any of the resources.
To preclude such possibility, the spin type mechanism employs a predetermined rule concerning the sequence of holding and releasing locks. For example, a lock may be owned and released in the alphabetical order of resource labels. Although the spin type is ineffective in uniprocessing, means for serializing reference/update operations without spinning is available in uniprocessing because the spin type inhibits interrupts while a resource is owned.
Among the suspend type and spin type mechanisms, some employ firmware for controlling LOCK and UNLOCK instructions, which belong to a hardware mechanism, and show them as an ENQ/DEQ macro instructions or semaphore control instructions to software, thereby serializing the use of resoures.
The most difficult problem with serialization in multiprocessing is that when one of a plurality of processing units is stopped while holding a lock, the other processing unit is brought into a loop or a wait state and, as a result, processing of the whole system is interrupted.
Generally, to stop a processing unit which is executing a task, the task is aborted by a control program in another processing unit or, alternatively, the task is returned as far as a certain convenient point in the processing aspect and, then, rerun. If such a task has owned a lock, it is very difficult to perform processing without contradiction by use of another processing unit. This is because the lock has to be released while returning the state of a resource associated with the lockword/lockbit to one before the owning, by a postprocessing in the case of the aborting scheme and by a preprocessing in the case the returning scheme. In a routine which constitutes a basic portion of the control program, aborting means system-down and, since even the return of the processing by means of another program is impossible, it is usually treated as system-down.
In light of the above, there has been proposed a system for multiprocessing which allows processing to be correctly continued by causing a task or a routine to save data in a resource before reference/update into an area of a suitable main storage under self-control at the step of locking for using the resource, and to perform a recovery processing by the previously mentioned stop, i.e., introducing a function of restoring the original state of the resource by means of the saved data to release the lock and, while one processing unit is stopped, causing the other processing unit to sequentially perform a recovery processing associated with the task or the routine. For details of this type of system, a reference may be made to a paper entitled "Design of tightly-coupled multiprocessing programming" by J. S. Arnold et al, IBM SYSTEM JOURNAL, No. 1, 1974, pp. 60-87.
The problem encountered with the prior art systems is that the modules of all the tasks or routines are necessarily assigned with the previously discussed considerations for all the resources which require serial processing, resulting in considerably massive programs. Another problem is that the recovery processing is incomplete because it is sometimes impossible to return the resources to those states before use.