1. Field of the Invention
This invention is related to the field of processors and, more particularly, to store exclusive and memory barrier handling in processors.
2. Description of the Related Art
Certain regions of software code, referred to as “critical regions,” require controlled entry and exit. For example, in multiprocessor and/or multithreaded environments, one or more independent code sequences can access a shared data structure. The code that performs the accesses can be a critical region. If more than one processor/thread executes the critical region concurrently, the results of the execution may not be predictable and/or may not be as expected.
One mechanism for controlling access to the critical region of code is a spin lock on a memory location. Any code desiring to execute the critical section reads the memory location, checks its current value, and conditionally writes a value back. The value in the memory location indicates the status of the critical region (e.g. available or in use). If the value read by a given processor/thread indicates available, that processor/thread can write back a value indicating in use. For example, zero can indicate available and a non-zero value can indicate in use. In some cases, the non-zero value can carry additional information (e.g. identifying the processor or thread that is using the critical section). To operate properly, the read and the corresponding write by a processor/thread is performed atomically (i.e. one processor successfully writes to the location to indicate in use, and other processors are prevented from writing the location even if the other processors read the available value). In this fashion, only one processor/thread can detect that the memory location indicates available and successfully enter the critical region, even if the reads and writes from multiple processors/thread overlap in time.
Some processors implement load exclusive and store exclusive instructions to support atomic access. The load exclusive instruction causes monitoring hardware to begin monitoring an address accessed by the load exclusive instruction. If the corresponding store exclusive occurs prior to interference by another processor/thread, the store exclusive completes successfully and the processor/thread that completes the store exclusive can execute the critical section. Other processors/threads detect that the store exclusive occurred, and their own store exclusive instructions fail, causing the spin lock loop to be reexecuted in those processors/threads for example.
To ensure that memory accesses within the critical section do not occur out of order with acquiring access to the critical section, the spin lock loop can complete with a data memory barrier instruction. The data memory barrier is defined to cause all previous accesses to become globally visible prior to completion of the data memory barrier. The data memory barrier also prevents subsequent memory operations from being performed until the data memory barrier is complete.
The spin lock loops using load exclusive, store exclusive, and data memory barrier instructions provide correct operation of the spin lock loops. However, the performance of the processors (e.g. in terms of average number of instructions executed per clock cycle) tends to degrade because the operations are long latency operations and they are also synchronizing. Since spin locks are frequently executed, the effect on performance may be significant. The load exclusive, store exclusive, and data memory barrier instructions can be used in other program sequences besides spin locks as well.