The present invention relates generally to multiprocessor computing systems which facilitate the simultaneous execution of multiple tasks in a single system. More particurlarly, the invention relates to methods and apparatus which allow each of certain critical instructions, all performing multiple main storage accesses to shared data, to have the appearance of executing required main storage accesses atomically with respect to a predefined set or class of instructions.
2. Description of the Related Art
In uniprocessor computer systems, which involve only a single processor executing a single task at any given time, the control of computer resources presents few problems. The type of resources being referred to includes memory, communication channels, I/O devices, etc., although other types of resources are understood to exist. In such systems, only the task being executed can have access to any of the systems' resources. Accordingly, each task maintains its control over any required resources, as well as the central processing unit itself, until the task has completed its activity.
In effect, each instruction in a uniprocessor system is designed to be "atomic", where an atomic instruction is defined to be indivisable, i.e., appears as a single unit of work. In a uniprocessor system, two instructions from two different task instruction streams cannot appear to execute at the same time (with interleaved fetches or stores) since task switches (where one task is swapped out before another is swapped in) are constrained to occur on instruction boundaries (or at intermediate checkpoints within an instruction for very long instructions).
By contrast, control of processor access to system resources is essential in multitasking and multiprocessor computer systems, since these systems allow simultaneous or interleaved execution of multiple tasks which share resources. Various prior art schemes have been developed to control such accesses, for example, task queues, locking schemes, etc.
In known computer systems that provide for multitask/multiple processor operation on shared data, such as the IBM System/370, certain critical instructions are defined, each of which perform multiple main storage accesses on shared data in an atomic fashion. In such a system, a critical instruction executing on a given CPU appears to perform all of its accesses (to a main storage location) without any other CPU being able to access the same storage location in between the first and last access by the given CPU.
In the IBM System/370, instructions such as TEST AND SET and COMPARE AND SWAP are defined, each of which perform multiple main storage accesses in a manner that is indivisable from start to finish (i.e., are atomic).
The TEST AND SET instruction can be used to fetch a word from memory, test for a specific bit and return a modified word to the memory, all during one operation in which all other tasks of the other processor(s) are barred from accessing that particular word in memory. The fetch and return store forms an atomic unit or atomic reference which, once begun, cannot be interrupted by or interleaved with any other CPU in the multiprocessor system.
With respect to the COMPARE AND SWAP instruction, the fetch of an operand (for the purpose of the compare) and the store (for the purpose of the swap) into this operands' location, appear to be a block-concurrent interlocked-update reference as observed by other CPUs. i.e., any other CPU will not appear to do any main storage fetch or store between the first CPU's fetch (for the compare) and store (for the swap). Thus, the main storage operations of the COMPARE AND SWAP also appear to be atomic as observed by other CPUs.
Obviously, in a multiprocessor environment like the IBM System/370, the fetch for the compare test, and the store for the set or swap, must be done without any other CPU either fetching or storing data to a locked location between the first CPUs fetch and store. The instruction must be performed atomically with respect to all other instructions capable of running on any processor in the system.
It is well known that the hardware can be used to lock a given main storage location to afford the required protection for an operand stored at the given location. Any other CPU's accesses to this location can simply be delayed or just the interlocked accesses for this location can be delayed. An operand location based hardware locking scheme is used to support the processing of atomic instructions in prior art computers typified by the IBM System/370.
The atomic instructions themselves, such as the TEST AND SET and COMPARE AND SWAP instructions described hereinbefore, are often used to provide software with the ability to test a software lock and then to set the lock if it is not already set. This software capability is a means of guaranteeing the integrity of some function which might not work if a software lock were not available. After completing one or more general-purpose instructions which do the accesses (e.g., a LOAD instruction), software must then use another instruction to release the software lock.
The use of these software locks requires the calculation of lock addresses. The system overhead associated with address calculation can be significant, particularly when, for example, a tight loop is executed containing a compare against the location containing a lock bit. In this example, frequent and repeated address calculation for a given location is often required, thereby degrading processor performance.
Accordingly, it would be desirable to be able to minimize the need to use software locks to preserve the integrity of a data structure (single location, linked list, etc.) thereby minimizing the need to perform the address calculations required when utilizing software locks.
Another scheme for preserving the integrity of shared data structures in a multiprocessor/multitask environment is an address locking mechanism based on partitioning shared memory and locking the partitions required by a CPU for the duration of a given atomic instruction. With such a scheme, processor performance degradation is an inverse function of the number of main storage partitions, in turn a function of the number of signals (identifying the partitions) that are provided between CPUs. Such a mechanism becomes unwieldly as the number of partitions grow, however, conceivably all instructions can be made to appear atomic. Still, lock bits for each partition and address calculations to check, set and release the locks are required. Thus, no substantial improvement (in terms of address calculation) is realized utilizing a memory partitioning scheme over the locking scheme described hereinabove with reference to the IBM System/370.
Another problem inherent in the prior art related to address calculation is the impact on software which results when taking a program designed to run on a uniprocessor and executing the program in a multiprocessor/multitask environment. As pointed out hereinbefore, controlling access to shared data locations, more generally to shared data structures, is critical in a migration from a uniprocessor to multiprocessor environment. Accordingly, it would also be desirable to minimize the impact on software resulting from such a migration by providing a computer system that utilizes means other then operand location based hardware locks and the aforementioned software locks to insure the data integrity of shared data structures.
Prior art computer systems are also known which employ a Tightly Coupled Microprocessor feature to increase system performance. A computer system having a Tightly Coupled Microprocessor feature allows multiple identical processors to be coupled to a shared memory interface and allows multiple tasks to execute simultaneously in a single system. Such a system would benefit significantly if means other then operand location based hardware locks and software locks were available to insure the integrity of shared data.
In fact, it would be desirable if a computer system were available where certain critical instructions were classified into instruction sets (i.e., were predefined) based on the data structures or object classes the instructions affect. Then (1) only the instructions in a given class would need be locked out when an instruction in the class is being executed and, (2) no address calculation would be required to lock out the instructions in a predefined class once any instruction in the class is identified as being executed by a given processor. Hardware could be used to lock out the remaining members of the instruction class.
In effect, instructions in each class would constitute a set of "relatively" atomic instructions. That is, rather then providing some atomic instructions that are atomic with respect to all instructions running on other processors (as in the IBM System/370); sets of relatively atomic instructions could be defined to guarantee that while a given relatively atomic instruction (from a given class) is executing, main storage facilities which are used by the relatively atomic instruction are not changed by other processors executing relatively atomic instructions from the same class.
Instructions not in the same class of an executing relatively atomic instruction would be allowed to operate simultaneously on other processors. By definition, i.e., by not being in the same class, these instructions cannot affect the data structure being utilized by the executing relatively atomic instruction.
As a result, processor performance across the desired multiprocessor system may improve relative to prior art systems. It would only be necessary to protect the particular data structure affected when a given relatively atomic instruction is being executed. The processors not executing instructions in the same class as the relatively atomic instruction would be free to continue processing.
Furthermore, a computing system which supports the processing of the aforementioned classes of relatively atomic instructions, would support software migration from uniprocessor to multiprocessor systems and minimize the need for software locks in general. This is because the integrity of shared data would be based on affected data structure type (which is invarient between a uniprocessor and multiprocessor environment) and, as indicated hereinbefore, identification of an instruction as a member of a given class would be all that is needed to "lock out" the other class members via a hardware locking scheme As a result the software locks required heretofore would be eliminated.