1. Field of the Invention
The present invention generally relates to microprocessors, and in particular, the present invention relates to the execution of so-called "atomic" instructions in multiprocessor and/or multitasking machines. In the invention, atomics are implemented with a series of instructions which maximally use the existing hardware of the machine and require very little extra logic.
2. Description of the Related Art
In processor architectures and related technologies, an "atomic" instruction is a processor instruction that appears to be indivisible to other processors, and thus is performed in its entirety without interruption by other activities in the processor system. Atomics are primarily used in multi-processor and multitasking machine environments.
An atomic instruction is characterized by the two or three parts thereof. That is, while there are multiple kinds of atomics, they basically all execute two or three operations, i.e., all atomics have a load and a store, and some atomics also have a compare that conditionalizes the store.
For example, consider the exemplary case shown in FIG. 1 where multiple processors 102, 104 and 106 all look to the same centralized memory location 108 for a shared or global resource 110. Access to the resource 110 is controlled by use of atomics to execute a so-called "compare and swap". Each processor is assigned a unique identifier number. In the instance where one processor wants access to the global resource 110, which only one of the processors is permitted to have at once, the software will execute a compare and swap on the memory location 112 that controls that resource 110.
That is, the number contained in the memory location 112 is loaded, and the loaded number is checked to see if its a zero. If it is not zero, then access to the resource 110 is denied. If it is zero, the identifier number of the processor seeking the resource 100 is stored in the memory location 112. In this example, no processor has an identifier number of zero. When the processor no longer needs the resource 110, zero is again stored in the memory location 112 to allow access by other processors.
It is therefore important that a processor be able to load the old number appearing in the memory location, check it to see if its zero, and if it is zero, finish the store of its own number before any other processor might load that same zero. Atomic processing achieves this by treating the load, compare and store functions as a single indivisible instruction. In this case, only the one processor can see the old load data before the completion of the store.
It should be noted that the "load, compare, store" atomic only appears to be indivisible, and that typically the processor instruction set does not actually support such a multiple function instruction in a single operation. Rather, the multiprocessor system has a special and dedicated mechanism in place to ensure that while one process is executing an atomic instruction, no other process can manipulate any objects accessed within the instruction.
Once a processor has gained control of the shared object (e.g., by loading its identifier number in the corresponding memory location), normal cache coherency protocols take effect. An atomic has store semantics which inform all other cache memories to invalidate their own copy of the shared object upon the store of the processor gaining control. This ensures that only one valid copy of the resource exists. While one processor has taken exclusive control of the shared object, the processes of that one processor appear atomic to the other processors with respect to that object.
The "compare and swap" atomic is implemented by the provision of special hardware within the machine, i.e., the dedicated atomic unit 114 of the machine as shown in FIG. 1. The atomic unit 114 effectively takes control of the entire system by seizing the memory line and sequencing through a load, a compare, and then a store. Typically, the atomic unit 114 takes charge of a data cache for doing the load part of an atomic, and it takes charge of a store queue for doing the store part of the atomic. Also, the atomic unit 114 is frequently equipped with its own comparator for the compare part of the atomic. The atomic unit 114 thus contains state machines that take control of the existing data cache and the existing store queue, and generally it has its own compare circuitry.
There are inherent drawbacks to the conventional implementation of atomics. For example, it is necessary to specially equip the machine with the seperate atomic unit, with the resulting hardware and space requirements associated therewith. Perhaps more importantly, however, is the disruption in the normal processings cause by the actions of the dedicated atomic unit. That is, to implement the atomic "compare and swap", the atomic part must cause the processor to cease normal operations pending completion of the atomic. This creates the dual disadvantages of slowing processing speeds and executing extra control logic which is often difficult to debug. In fact, debug problems are a significant problem associated with the current implementation of atomics.