Many computing systems make use of parallel computing to enhance computing performance, such as by breaking down a computation task into several subtasks that can be processed independently and whose results are combined upon completion. Examples of common parallel computing systems include multi-processor computers and computers with multi-core processors. In order to make use of the enhanced performance, software can be divided into many independent executing threads, each running in parallel on separate processing elements.
While parallel computing can enhance performance, additional overhead is introduced in order to coordinate efforts between the separately executing threads. One mechanism to communicate and coordinate between threads is to pass information in shared memory locations. Approaches based on shared memory typically include some form of locking protocol to enable individual threads to know the status of the data read from shared memory locations.
One approach to locking access to the shared memory location is to involve the operating system to place threads that are requesting access to a contended lock into a suspended state so that other threads can be scheduled to run. However, the use of the operating system in implementing thread locks can cost a significant number of machine cycles in overhead which, in turn, can degrade performance.