Computing devices and computing systems operate in environments that have multiple processors. In many of these multiple processor environments, the multiple processors can have access to the same data resources. In this environment, you can have 2 cores (computer processors) competing for a line (same piece of data). The data will be moved back and forth between the caches of the two processors. There is a need to ensure that at any given time only one processor will be allowed to modify the data. Also, when the data is modified, the other processors on the line are informed. The value that a processed cached for that data may no longer be the accurate value for that data.
In power hardware, there are two special instructions (lwarx and stwcx) to allow the atomic update data (read-modify and store it back as one action). This is done in one action so that anyone looking at it could not see any intermediate state. Also, if two processors are performing the same type of activity, one would be able to get an update to the line, before sending their update.
A lwarx instruction will grab a reservation on a line that tells the hardware that the line will be modified and wants the right to modify the line. Stwcx instruction will commit the changes back to that data. This process will fail if someone else had a reservation to the line and committed their changes before you.
The hint bit sets the line (resource) and does not permit changes before line is released. The hint bit tells the hardware when you want to perform a change whether to keep the line local or release the line. If the hint bit is set wrong, then the line will bounce around unnecessarily and cause performance problems.
Power computing systems can also have local cache memory locations associated with each processor. Because of the multiple cache devices, the computing system needs to have cache coherency. In computing, cache coherence refers to the consistency of data stored in local caches of a shared resource. In a shared memory multiprocessor system with a separate cache memory for each processor, it is possible to have many copies of any one instruction operand: one copy in the main memory and one in each cache memory. When one copy of an operand is changed, the other copies of the operand must be changed also. Cache coherence is the discipline that ensures that changes in the values of shared operands are propagated throughout the system in a timely fashion. There are three distinct levels of cache coherence.
1) Every write operation appears to occur instantaneously.
2) All processors see exactly the same sequence of changes of values for each separate operand.
3) Different processors may see an operation and assume different sequences of values (this is considered non-coherent behavior).
In both level 2 behavior and level 3 behavior, a program can observe stale data. Recently, computer designers have come to realize that the programming discipline required to deal with level 2 behavior is sufficient to deal also with level 3 behavior. Therefore, at some point only level 1 and level 3 behavior will be seen in machines.
Along with the cache coherency is a coherency protocol. This protocol maintains the consistency between all the caches in a system of distributed shared memory. The protocol maintains memory coherence according to a specific consistency model. Older multiprocessors support the sequential consistency model, while modern shared memory systems typically support the release consistency or weak consistency models.
Transitions between states in any specific implementation of these protocols may vary. For example, an implementation may choose different update and invalidation transitions such as update-on-read, update-on-write, invalidate-on-read, or invalidate-on-write. The choice of transition may affect the amount of inter-cache traffic, which in turn may affect the amount of cache bandwidth available for actual work. This should be taken into consideration in the design of distributed software that could cause strong contention between the caches of multiple processors.
Regarding setting the hint bit, software has ability to set a hint bit on a lwarx instruction that specifies STCX operation in the L2, codes can either be:
“lock”→lwarx/stcx/critical section/store to release the lock
“atomic update”→lwarx/critical section/stwcx.
A major cause of performance problems in the field has been related to SW setting these bits incorrectly, which lead to serious performance scaling issues.
The lwarx instruction can have a field for a hint bit that tells a computer compiler whether an action is an Atomic Update or a Lock action. If the compiler sets the hint bit correctly, the hardware will keep the line (data, resource, etc.) reserved when a process is still using the line. If the hint bit is not correctly set, this status indicates that the line is available for other processes. If other processes attempt to reserve the line prior to release by the current process that has it reserved, the line could move around between processes and thereby cause a decrease in performance.
Currently, incorrect hint bit detection is based on observations from individual computer personnel. These persons evaluate the performance a line and through experience can estimate whether there is a problem with incorrect hint bits. There remains a need for a method and system that can automatically and dynamically detect an incorrect hint bi and initiate actions to correct that incorrect hint bit.