The present invention relates to computer systems, and more particularly to such systems executing multiple threads.
Computer systems including multiprocessor (MP) and single processor systems may include a plurality of “threads,” each of which executes program instructions independently from other threads. Use of multiple processors or threads allows various tasks or functions and even multiple applications to be handled more efficiently and with greater speed. Utilizing multiple threads or processors means that two or more processors or threads can share and simultaneously access the same data stored within the system. However, care must be taken to maintain memory ordering when sharing data.
For data consistency purposes, if multiple threads or processors desire to read, modify, and write to a single memory location, the multiple agents should not be allowed to perform operations on the data simultaneously. Further complicating the use of multiple processors is that data is often stored in a cache associated with a processor to speed access to the data by that processor. Because such caches are typically localized to a specific processor, the most recent update to the data could be located in any one of the caches in the system. Any agent accessing this data should receive a valid or updated data value from the cache with the most recent update, and data being written from the cache back into memory or transferred to other caches must be the current data so that cache coherency is maintained.
Multithreaded (MT) software uses different mechanisms to interact and coordinate between different threads. One common form of MT synchronization is a semaphore spin-lock. A semaphore spin-lock mechanism is a lock operation used to guarantee mutual exclusion (i.e., prevent simultaneous access) across multiple threads while accessing a shared memory variable or structure (i.e., a shared element). In order to provide a unique and consistent view of the shared element, it is guarded by a lock variable. Every thread needing access to the shared element must acquire the guarding lock via an atomic semaphore operation. To acquire the lock, a thread essentially reads the value of the lock variable, compares the value to a predetermined ‘free’ value, and then writes a ‘lock’ value. This read-modify-write operation must appear to happen in one step so that multiple threads do not read a ‘free’ value and simultaneously write the ‘lock’ value, thus allowing both threads to believe they have acquired the lock.
After a given thread acquires a lock on a lock variable, other threads desiring to access the lock variable typically must wait until the original thread completes its lock operation. Typically, other threads seeking access will initiate a snoop on the address of the lock variable to check the state (i.e., ‘free’ or ‘locked’). A thread that finds the ‘locked’ value will often wait a short time and snoop again, thus spinning in a small snoop-wait loop. Contention occurs when one or more threads desire access to a lock variable already owned (i.e., locked) by another thread or access to a lock variable being accessed by another thread. A lock operation is uncontended when, during runtime, only one agent seeks to execute a lock operation on the lock variable at a time. A thread will write the ‘free’ value back into the lock variable when it is finished modifying the shared variable or structure.
Because synchronization occurs frequently in MT applications, a processor should efficiently implement lock operations so that the MT applications may perform as desired. If a lock operation is uncontended at runtime, it can be heavily optimized. Such optimizations can include speculative prefetching of an associated cache line, lock elision, and the like. However, these optimization techniques incur a significant penalty if the lock operation is contended.
A need therefore exists to predict contended lock operations, thereby enabling a processor to efficiently implement lock operations.