1. Field of the Invention
This invention is related to processors and, more particularly, to synchronization mechanisms for multiprocessor systems.
2. Description of the Related Art
Processors designed for use in multiprocessing systems typically support some sort of mechanism for synchronizing processes executing on the various processors. For example, certain sections of code may be designated as “critical sections”. Critical sections may update variables shared by the processes, read or write files, etc. Typically, the processes are synchronized such that at most one process at any given time is executing the critical section. As another example, the processes may share certain data areas in memory. Access to the shared data areas may be controlled in a similar fashion, synchronizing such that at most one process has access (or perhaps at most one process has write access, with other processes possibly having read-only access) to the shared data area at any given time.
Support for synchronization has been provided by processors in the form of an atomic read-modify-write of a memory location. The atomic read-modify-write can be used to implement various synchronization primitives such as test and set, exchange, fetch and add, compare and swap, etc. Synchronization may be managed by using atomic read-modify-writes to designated memory locations to communicate whether or not a critical section or shared data area is available, to indicate which process currently has access to the critical section or shared data area, etc. The designated memory locations are often referred to as “semaphores”.
Some processors may support atomic read-modify-writes using a lock mechanism. With a lock mechanism, when a processor accesses a memory location, other access to that memory location is prevented until the processor releases the lock. The atomicity of the read-modify-write operation to the memory location is guaranteed by preventing other processors from accessing that memory location. Lock mechanisms may be problematic in practice. For example, if the lock is implemented by locking a resource for accessing memory (e.g. a shared bus), deadlock may result (especially in coherent systems). Lock mechanisms for larger systems (e.g. multiple levels of interconnect between processors) may be problematic to implement.
Another approach for providing an atomic read-modify-write mechanism is the load-linked/store conditional mechanism. In this mechanism, two types of instructions are provided: the load-linked and the store conditional. Generally, a load-linked instruction and a store conditional instruction to the same address are used in pairs. The load-linked instructions operate similar to typical load instructions, but also cause the processor to monitor the target address of the load instruction (the address of the data accessed by the load). The store conditional instruction conditionally stores to the target address based on whether or not the target address is updated by another processor/device between the load-linked instruction and the store conditional instruction. Other conditions may cause the store not to occur as well. The store conditional may provide an indication of whether or not the store was performed, which may be tested by subsequent instructions to either branch back to the load-linked instruction to attempt the read-modify-write operation again (if the store was not successfully performed) or to continue processing (if the store was successfully performed). With the load-linked/store conditional mechanism, other processors may access the memory location for which the atomic read-modify-write is being attempted. If a modification occurs, the load-linked/store conditional sequence is repeated. When the store conditional completes successfully, an atomic read-modify-write of the location has been performed.
Processors and other devices which couple to a shared interconnect may use the order of transactions on the interconnect to determine the order in which processors/devices update the memory location targeted by a load-linked/store conditional pair. For example, if various processors have a shared copy of the data at the memory location (read via the load-linked instruction), a first processor may perform a transaction to the memory location on the interconnect in response to the store conditional instruction (to gain exclusive access). Since the transaction occurs before transactions by other processors/devices, the first processor should update the memory location (i.e. complete its store conditional instruction successfully). Other processors may perform transactions to gain exclusive access to the memory location before the first processor completes the store conditional instruction (e.g. the first processor may be waiting to receive data for the transaction that provides the first processor with exclusive access). To prevent the store conditional from failing, the first processor may delay the effects of state changes in response to the other processors' transactions until after the outstanding transaction by the first processor is completed. Such action may also be used to guarantee forward progress in general (e.g. permitting a processor to use the data at least once before passing the data on to a subsequent-accessing processor in response to a snoop).
While the above mechanism may provide proper operation in a system in which the interconnect is the only ordering point, the above mechanism may not function properly if the processors/devices and interconnect are one node of a multinode system (e.g. a distributed shared memory system). In a multinode system, a processor in each node may perform the transaction to obtain exclusive access to the memory location at about the same time. Internode communications may be used to maintain coherency across the nodes, and the internode communications may result in transactions on the interconnect in each node. However, the effects of these transactions would be delayed until the outstanding transactions in each of the above processors completed. Thus, one processor in each node may determine that it has successfully completed a store conditional to the same memory location, and the synchronization among the multiple nodes would be lost.