Today parallel execution of processes is common in computer systems, and the serialization problem addressed in this application has been addressed in prior computer systems. But the prior techniques have not been as effective or able to obtain the efficiency of system operation attainable with the subject invention, which uses a new blocking symbol technique. The problem is that multi-processor shared data cannot be accessed and modified in parallel. A group of accesses by one processor must be executed one at a time but as a logical or atomic unit as seen by other processors in order to preserve the integrity of the data in the shared computer resource. It is said that the accesses by the processor must be serialized with respect to accesses by other processors.
The invention deals with preserving the integrity of multi-processor shared data contained in data structures in a computer system. These shared data structures are herein referred to as "resources" of the computer system, which for example include data files, queues of any type, buffers for temporarily holding data, changeable programs, etc.; in other words, any computer data structure which may be changed by parallel processes.
Serialization in accessing a resource by multiple processes (programs being executed by multiple processors) means that only one process at a time is allowed to access the resource. Accessing includes both processor fetching and storing to addressed locations in the main storage of a computer system. Programs may perform multiple accesses to a single data structure, and these accesses must be serialized as a single unit of operation if the same program, or other programs, executing on other processors of the system are always to see a consistent view of the resource.
An example of serialization is: if process 1 stores the values A and B at locations X and Y, respectively, and process 2 stores the values C and D at locations X and Y, respectively, it may be imperative that the final contents of X and Y are either A and B or C and D but not A and D or C and B.
A second example is: If locations X and Y contain A and B, respectively, and process 1 stores C and D at X and Y, respectively, and process 2 fetches from locations X and Y, it may be imperative that process 2 fetches either A and B or C and D but not A and D or C and B.
A third example is: if processes 1 and 2 fetch from location Z and, because the fetched values are a predetermined value P, process 1 stores Q at location Z and process 2 stores R at location Z, process 1 may conclude that location Z contains Q while process 2 concludes that Z contains R.
The third example is addressed by U.S. Pat. No. 3,886,525 to Brown et. al., assigned to the same assignee as the present invention. U.S. Pat. No. 3,886,525 discloses a Compare and Swap (CS) instruction. CS has a first operand and a third operand in registers and a second operand in storage at a specified address. CS compares the first operand to the second operand and, if they are equal, stores the third operand at the second-operand location and indicates this result by a condition code 0. If the first and second operands are not equal, CS loads the second operand into the first-operand register and indicates this different result by a condition code 1. CS has the novel feature that, between the time when the second operand is fetched and the time when CS completes by setting condition code 0 or 1, no other instruction, executed by another processor, is allowed to store at or fetch from the second-operand location, this effect being achieved by the locking by CS, in the cache of its processor, the line containing the second-operand location. This novel feature of CS is called an interlocked-update reference.
In the third example, if both processes fetch and store to location Z by means of CS, then only one of the processes will fetch the value P, and only that process will store at location Z.
The CS instruction is not applicable to the first and second examples since X and Y in those examples are two different storage locations which most probably are in different cache lines. Processors cannot each lock two cache lines since this may lead to deadlock. For example, if processor 1 has locked a line containing X and is attempting to lock a line containing Y, processor 2 may have locked a line containing Y and be attempting to lock a line containing X. Now neither processor can proceed.
The first and second examples have been addressed in practice by a programmed lock. A simple programmed lock can be implemented by means of the CS instruction as described for the third example. In that example, the second-operand location can represent a lock, and the predetermined value P can have the meaning that the lock is not currently held. If either process 1 or 2, using CS, replaces P with the third-operand value for the process (Q for 1 or R for 2), then that process has obtained the lock and is allowed, by programming convention, to access the locations X and Y of the first or second example. The other process, by programming convention, is not allowed to access X and Y until it obtains the lock. It cannot obtain the lock until the first process (which may be 1 or 2) has replaced the contents of Z with P.
Now, the problem arises in practice of what a process is to do while it is waiting for the lock represented by location Z. A solution which is practicable in certain cases is for the process to treat the lock as a "spin lock", that is, the process repeatedly executes a CS instruction (spins on the lock) until it obtains the lock. This solution wastes the time of the processor that is executing the process and is only practicable if it can be assured that the lock will not be held for more than a brief time. In the general case, the lock can be held for a very long time because the process holding it may, for example, encounter a page fault when accessing the location X or Y of the first or second example. This will cause the process to be interrupted in order for the control program to resolve the page fault. Since it may take a long time to do that, the control program may undispatch the process and dispatch another process in its place. The control program may even swap out the address space containing the process. Thus, a spin lock is only practicable in practice when it is known there cannot be a page fault and when the processor is disabled for other asynchronous interruptions such as an I/O interruption.
In the general case, the solution to the problem of what to do when a lock is held is to use control program services. For example, a lock can be represented by an event control block (ECB), a process that finds the lock is held can invoke a Wait service specifying the ECB, which will cause the process to be undispatched, and the process holding the lock can release the lock by invoking a Post service specifying the same ECB, which will cause the first process to be placed on a queue of processes eligible to be redispatched. The use of these services can be very time consuming, and it is highly desired that some hardware-assisted method, beyond the simple CS instruction, be available to provide locking in order to serialize the use of resources.
U.S. Pat. No. 5,081,572 to Arnold, assigned to the same assignee as the present invention, discloses Compare and Swap Disjoint (CSD) and Compare and Load (CL) instructions for use as in the first and second examples above. CSD and CL are said to perform interlocked-update references to two locations. In practice, the way of implementing Arnold's interlocked-update references is to quiesce all processors except the one executing the CSD or CL instruction. This is very wasteful of the time of the quiesced processors and is inefficient.
Another way of replacing a programmed lock is a classification of a subset of computer instructions into a locking class for controlling access to a general type of resource to be protected by use of the instructions in that class. This is disclosed in U.S. Pat. No. 5,333,297 to Lemaire et al (owned by the same assignee as the subject application), and in later filed U.S. Pat. No. 5,488,729 to Vegesna et al. Neither of these patents discloses blocking symbols, which are an essential component of the subject invention. The Lemaire et al patent classifies subsets of computer instructions into locking "classes", each identified by a particular operation code and dedicated to a general type of resource in a computer system, with the type indicated by their instruction operation code. The Vegesna et al patent classifies instructions according to the execution unit which is to execute the instruction, and is a different type of instruction classification than in the Lemaire et al patent. In Lemaire, each instruction class (determined by the operation code) is designed to atomically make data changes in a general type of resource, associated with that instruction class, such as double-threaded queues. A severe restriction on use of the Lemaire instruction-classification invention is that it prevents any other instruction in a locking class from executing on currently unused resources when any instruction is executing in the same locking class on any other processor of the CEC (Central Electronic Complex), comprised of multiple processors. Such locks are provided in each processor's hardware-microcode and not in a centralized hardware-microcode storage area separate from each processor. The lock on a resource is duplicated in each processor's hardware, and inter-processor communications are required to coordinate the state of the lock between these multiple copies of each lock.
The subject invention does not use either interlocked-update references or classes of instructions for controlling its locking, and this invention does not suffer from their deficiencies. The resource serialization done by the subject invention does not have the Lemaire restriction of allowing only one of multiple processor's executing Lemaire's instruction classes to access one resource at a time. On the other hand, the subject invention allows multiple processors to simultaneously access plural resources in parallel as long as each of the processors is accessing a different resource using a different lock and a different blocking symbol. The subject invention discovers the use of blocking symbols (not found in any prior art) for identifying resources and controlling their access serialization. The subject invention uses blocking symbols in a new type of instruction to allow the users of a system (e.g. its programmers and programs) to have more precise access control, more resource accessing parallelism, and more resource access granularity than the techniques taught by any known prior art.
The lock classification of instructions (as done in Lemaire et al) binds the association of instruction class to resource type at the time a computer architecture is designed to use such instruction classes. The architectural decisions must be made before a computer design is released for manufacturing a computer. Thus computers built with Lemaire's lock classified instructions may not later be architecturally changed, which may prohibit a software associations of locks with later installed resources in a computer system. In most computer installations, new programs are always being developed and many old programs changed, so resources to be serialized will be defined, or changed, over time from program to program. The invention in the subject application enables late software binding of locks to resources of any granularity, usable for controlling the serialized accessing of newly added resources. The locks are chosen and specified to the machine dynamically by the programs, for example. The machine enforces the serialization of operations on the resources represented by the program-specified blocking symbols.
Also, this invention does not require inter-CPU communication broadcasts on inter-CPU buses between CPUs on each processor instruction execution initiation to signal to each other processor as is done in Lemaire et al.