1. Technical Field
The present invention relates in general to data processing and, in particular, to allocating and accessing resources within a data processing system. In at least one embodiment, the present invention relates still more particularly to a method and system for efficiently allocating and accessing promotion facilities, such as locks, in a data processing system.
2. Description of the Related Art
In shared memory multiprocessor (MP) data processing systems, each of the multiple processors in the system may access and modify data stored in the shared memory. In order to synchronize access to a particular granule (e.g., cache line) of memory between multiple processors, programming models often require a processor to acquire a lock associated with the granule prior to modifying the granule and release the lock following the modification.
In a multiprocessor computer system, multiple processors may be independently attempting to acquire the same lock. In the event that a processor contending for a lock successfully acquires the lock, the cache line containing the lock is transmitted via the system bus from system memory or the cache hierarchy of another processor and loaded into the processor's cache hierarchy. Thus, the acquisition and release of locks in conventional data processing systems can be characterized as the movement of exclusively held cache lines between the data caches of various processors.
Lock acquisition and release is commonly facilitated utilizing special memory access instructions referred to as load-reserve and store-conditional instructions. In shared memory MP data processing systems that support load-reserve and store-conditional instructions, each processor within the system is equipped with a reservation register. When a processor executes a load-reserve to a memory granule, the processor loads some or all of the contents of the memory granule into one of the processor's internal registers and the address of the memory granule into the processor's reservation register. The requesting processor is then said to have a reservation with respect to the memory granule. The processor may then perform an atomic update to the reserved memory granule utilizing a store-conditional instruction.
When a processor executes a store-conditional to a memory granule for which the processor holds a reservation, the processor stores the contents of a designated register to the memory granule and then clears the reservation. If the processor does not have a reservation for the memory granule, the store-conditional instruction fails and the memory update is not performed. In general, the processor's reservation is cleared if a remote processor requests exclusive access to the memory granule for purposes of modifying it (the request is made visible to all processors on a shared bus) or the reserving processor executes a store-conditional instruction. If only one reservation is permitted per processor, a processor's current reservation will also be cleared if the processor executes a load-reserve to another memory granule.
A typical instruction sequence for lock acquisition and release utilizing load-reserve (lwarx) and store-conditional (stwcx) instructions is as follows:
Aload X!read lock valuecmpi!compare to determine if lock availablebc A!loop back if lock not availableBlwarx X!attempt to obtain reservation for lockcmpi!determine if obtained reservation for lockbc A!loop back if no reservation obtainedCstwcx X!attempt to set lock to “locked” statebc A!loop back if store-conditional failed. . .!do work on shared data to which access issynchronized by the lockstore X!release lock by resetting to “unlocked” state
As indicated, the typical instruction sequence includes at least two separate branch “loops”—one (identified by “B”) that is conditioned upon the processor obtaining a valid reservation for the lock through successful execution of the load-reserve instruction, and another (identified by “C”) conditioned upon the processor successfully updating the lock to a “locked” state through execution of the store-conditional instruction while the processor has a valid reservation. The lock acquisition sequence may optionally include a third branch loop (identified by “A”) in which the processor determines whether the lock is available prior to seeking a reservation for the lock.
This conventional lock acquisition sequence incurs high overhead not only because of its length but also because of the conditional nature of reservations. That is, a first processor may lose a reservation for a lock before successfully acquiring the lock (through execution of a store-conditional instruction) if a second processor stores to (or acquires ownership of) the lock first. Consequently, if a lock is highly contended, a processor may make a reservation for a lock and lose the reservation many times prior to successfully acquiring the lock through execution of a store-conditional instruction.
At least one processor manufacturer has tried to address this problem by implementing a “brute force” solution in which a processor executing a load-reserve instruction is granted exclusive access to the interconnect. That is, while the reservation is held by the processor, only the processor executing the load-reserve instruction is permitted to master operations on the interconnect, and all other processors are “locked out,” not just from accessing a particular data granule, but from initiating any operation on the interconnect. Consequently, the processors locked out of the interconnect may stall for lack of data while the reservation is held. Obviously, this solution does not scale well, particularly for systems running code in which locks are highly contended.