The invention relates generally to the use of lockwords for establishing control and queuing in a multi-task computer environment and, in particular, relates to task locking that avoids spin locking.
In a computer system involving only a single processor executing a single task at any time, the control of computer resources presents no problem. The type of resources being referred to can be mass memories, tape drives, printers or communication channels, although other types of resources are understood to exist. Only one task exists to have access to any resource and maintains its control over all the any resource as well as the central processing unit until the requested resource has completed its activity.
However, multi-tasking and multiprocessor systems have become popular which allow simultaneous or interleaved execution of multiple tasks with the resources somehow shared between the simultaneously executing tasks. Some resources like a printer or a tape drive operate such that the requesting task requires exclusive access to that resource for at least some period. Other resources, such as parts of a common storage area, may be shared among various tasks. In order to arrange controlled access to a resource, a queue is set up for all tasks that have requested access to a resource but are not granted immediate access. The queue must further contain the information as to whether a task in the queue is requesting shared or exclusive access of the resource and whether the resource is currently being used on a shared or exclusive basis.
Thus, when a task requests access to a resource but is refused immediate access, the operating system rearranges the queue to reflect the addition of the requesting task to the queue. However in a multi-tasking environment, the possibility always exists that two or more tasks will request access to a particular resource at almost the same time and, if not prevented, will proceed to rearrange the queue concurrently. This rearrangement largely involves serializing the queue, that is, setting up an ordered list of who is in the queue. If this rearrangement is being performed concurrently by two different tasks, one of the requesting tasks may not join the queue or, even worse, the entire organization of the queue will be destroyed.
In order to avoid these problems, a lockword is established for each resource. If the queue for that resource is currently being rearranged, the lockword indicates this fact, and the operating system prevents a second task from manipulating the queue. However, if the lockword indicates that no queue manipulations are in progress, then the requesting task first changes the lockword to assert ownership of queue manipulations for that resource and proceeds to rearrange the queue according to its requirements.
At the end of the queue manipulation, the lockword is reset to a state indicating that no queue manipulation is currently in progress.
To avoid all possibility of two tasks concurrently rearranging the queue, the initial testing and setting of the lockword must be performed such that only a single task can at any time be performing this pair of operations.
In an IBM System/370, designed for a multi-tasking environment, there is a test and set instruction which can fetch a word from memory, test for a specific bit and return a modified word to the memory, all during one operation in which all other tasks or processors are barred from accessing that particular word in memory. The fetch and return-store forms an atomic unit or atomic reference which, once begun, cannot be interrupted by or interleaved with any other CPU in a multi-processor. The test and set instruction can therefore be used to test a lockword and to set it for ownership. The set of operations is described in Table 1 in which one bit of the byte LOCKWORD is tested for zero, indicating availability of the lockword. LOCKWORD is immediately rewritten with this bit set to a "1". The result of this testing is retained and used in the next step by a conditional branch BC. If the testing was not successful, i.e, the lockword was owned by another task or processor, execution branches back to retry, the test and set operation. When the lockword is available and ownership of the lockword is established, a series of operations are performed in which the queue is manipulated by this requesting task or processor. While this manipulation is proceeding, no other task can manipulate the queue because this task owns the lockword. When the manipulation has been completed, a final instruction rewrites the lockword to indicate that it is once more available. LOCKWORD is set to zero, indicating that the queue is once more available to other requesting tasks or processors.
TABLE 1 ______________________________________ retry TS LOCKWORD BC CC1,retry . . . alter queue . . . MVI LOCKWORD, 0 SPIN-LOCK ______________________________________
The above series of operations is called spin-locking because, if a task cannot gain ownership of a lockword, it keeps spinning or trying to obtain such ownership until the using task finally relinquishes control. Such spinning is wasteful and in some situations can severely degrade the throughput of a multi-tasking computer. A particularly bad situation occurs if one processor is in spinlock because a second processor owns the lockword and then the second processor fails before it relinquishes the lockword. In this case, the first processor continues to spin for an indefinite time because of the failure of another processor.
A pictorial illustration of the hierarchy involved in task locking implemented with test and set is shown in FIG. 1 (A). Test-and-set is too primitive to provide direct identification of the owning task or processor for a lockword when a CPU failure occurs or to provide more than one owner of a resource. For this reason, test-and-set is used to control manipulation of small queues, which may consist of single elements, that in turn control the manipulation and examination of other queues. These elements allow the identification of the task or processor owning the queue when a CPU failure occurs and provide the ability for more than one task to have ownership of a queue at the same time, as might be useful for tasks which examine a queue without altering it.
The enhanced spin locks are, in turn, used to control the manipulation of queues for which tasks, but not processors, are suspended until the required availability. Requests which allow concurrent ownership to others are called "shared" requests. Requests which allow no other component ownership are called "exclusive" requests. Requests which cause the task to be suspended without suspension of a processor are called "task locks".
The first level of task locks provides control for resources and queues on which the operating system is dependent for its continued operation. These are called "supervisor task locks", since they are available only to supervisory programming.
One of the supervisor task locks is used, in turn, to control the manipulation of queues which provide control for resources and queues on which the operating system is not dependent for its continued operation. These are called "application task locks", since they are available to any programming.
The importance of the hierarchy is that application task locking requires four levels of operations. The multiplicity of levels produces a complex and slowly operating system.
An important capability of the System/370 series of processors is made possible by two instructions, named "compare and swap" and "compare double and swap". The two instructions differ only in that compare and swap operates on single length words, while compare double and swap operates on double length words. As used here, a word is four bytes (thirty-two bits) long while double words are twice that length. Because the embodiment to be described later uses double words, only compare double and swap is described.
The compare double and swap or CDS operates on three operands so that it assumes the form of CDS (OLD, NEW, LOCK), where OLD, NEW and LOCK are double length words. The effect of CDS is illustrated in FIG. 2. If the value of LOCK equals or matches the value of OLD, the LOCK is replaced with the value of NEW; however, if LOCK does not match OLD, then OLD is replaced with LOCK and LOCK remains unchanged. A condition code CC is set depending on the outcome of the test for LOCK=OLD. This condition code can be used to separate the operational flow depending on the success of the test.
The CDS operation shares the attribute with test and set that it is an atomic reference. That is, it fetches and stores back into memory in a single operation that cannot be interrupted by any other processor. Although, FIG. 2 shows five operations, CDS is accomplished as though it were a single operation. This atomic character, coupled with its similarity to test and set, allows CDS to replace the test-and-set operation in a supervisor spin exclusive. Indeed means have been described elsewhere to use compare double and swap in both a supervisor spin share and a supervisor task exclusive. These possible uses of CDS are shown in the hierarchy shown in FIG. 1(B). The result is that for exclusive tasks, only three levels of operations are required for application task locking. Until now, shared access task locking has required the use of a supervisor spin or exclusive task lock to control access to the controls which, in turn, are used to provide shared access and coordinate shared access with exclusive access requests for the same queues or resources
Because the suspension and resumption of a task cannot itself be suspended and resumed as a task, the lock which controls its queues must necessarily be a spin lock. If this spin lock is not the same lock used to control access to the controls used to provide shared access, another level of locking may be introduced when a task must be suspended or resumed for the lack of availability or the reappearance of availability of a resource. These three levels are an improvement over the four levels required with test and set. However, the three levels still introduce system complexity with shared tasks. Furthermore, they contribute to unwanted system complexity and slow its operation.