1. Field of the Invention
The present invention generally relates to an apparatus and method for efficiently sharing data in support of hardware cache coherency and coordinated in software with semaphore instructions.
2. Description of Related Art
Multiprocessor systems are used to increase performance over single processor systems, by having more than one processor working on a problem at a time. In order to work effectively, data sharing is supported by hardware cache coherency and coordinated in software with semaphore instructions.
Hardware cache coherency insures that software running on different processors have a consistent view of what is the current value of all memory data, even though each processor has a cache of a portion of that data. In a typical coherency system, a processor may obtain data as shared copy or as a private copy.
More than one processor may simultaneously have a shared copy of the same block of data, and all shared copies of the same data will be identical. A shared copy can be used to satisfy all read requests, caused typically by load instructions, but cannot be modified as necessary to satisfy a store instruction or semaphore instruction.
Only a single processor may have a private copy of data at a time, which is guaranteed to be the only copy of the data, i.e. no other shared copies existing. A private copy of data can be used to satisfy both read requests and modification requests.
If a processor has a valid copy of data in its cache, with the sufficient level of ownership, the time required to execute an instruction that accesses the data in the cache is small, typically one to several cycles. On the other hand, if the cache does not have the data, or the level of ownership is insufficient, an external request must be issued to obtain the data and/or the ownership to satisfy the instruction. That external request is much slower than the cache resident case, often 10 to 100 times slower.
In order for software to coordinate the sharing of data, there are special "semaphore" operations that typically read and modify a data location "atomically," where hardware guarantees that no other processor has access to the data location between the read and the modification.
Even though semaphore operations are a very small portion of the instruction mix, they cause a significant portion of the coherency traffic, including cache misses and private ownership requests. Since semaphore operations are used to support data sharing, it is likely that the data requested for a semaphore operation is currently resident in another processor. For certain types of semaphore operations, for example the Cmpxchg instruction, two coherency operations are typically required for a single semaphore operation, which is twice the amount of overhead than is desirable.
As shown in FIG. 1, a typical semaphore process utilizes a load and a Cmpxchg instruction. The Cmpxchg instruction is often used when the new value of a semaphore data is dependent on the previous value. Thus, the semaphore process will typically "load" the current value of the semaphore data, as shown at step 11, perform the computation, as shown at step 12, then issue a Cmpxchg instruction to test if the data location is private at step 13. If the test, at step 13, indicates that the data location is private, step 15 then tests if the current value of the semaphore data is the same value that was "loaded". If the test indicates that the value is the same value as "loaded," then the new value is stored into the semaphore location, at step 16. If however, the value is not the same value as "loaded," then the "condition" will indicate a bad data access, at step 17.
In practice, a load instruction only requires shared ownership to complete, and will request shared ownership in its coherency operation, as indicated in step 11. Later, when the Cmpxchg instruction is executed, as shown at step 13, the Cmpxchg instruction requires private ownership, so a second coherency operation must at that point be issued to complete the Cmpxchg instruction, as shown at step 14. At that time, the test at step 15 is performed to check that the current value of the semaphore data is the same value as "loaded," and the process continues as described above. Other semaphore operations may have the same problem, depending on the usage.
As shown in FIG. 2, one semaphore process implementation that addressed this problem in the past is the Load-Linked/Store-Conditional pair of instructions in the MIPS architecture. The Load-Linked instruction would gain private ownership of a cache line while satisfying the load, and mark the cache in a particular "linked" state, as shown at step 21. If after performing the computation, as shown at step 22, the accessed line stays resident in the processor's cache until the Store-Conditional instruction is executed, as shown at step 23, then the store will complete and the "condition" will indicate success at step 24.
Load-Linked/Store-Conditional semaphore process has a major disadvantage in that when there are multiple processors trying to operate on the same semaphore, another Load-Linked/Store-Conditional pair of instructions can grab the semaphore location away. Then, when the Store-Conditional test instruction is executed, as shown at step 23, the "condition" will indicate a bad data access, as shown in step 25, thereby raising the possibility that no processor will succeed in actually storing to the semaphore and causing system deadlock to occur, thus preventing forward progress.
Heretofore, processors have lacked the ability to indicate to the hardware cache the need to try and maintain private ownership. Accordingly, it is desirable to have a new instruction called "Load-Bias" which, in addition to normal load operation, requests a private copy of the data, and hints to the hardware cache to try to maintain private ownership until the next memory reference from that processor.