There may be many cases where one thread in an application has to wait for another thread to reach a certain point before the thread can make further progress. This may be a thread which is, for example, waiting on a mutual exclusion lock (mutex), or which is consuming events from a queue, and needs work to be placed on the queue before progress can be made. A typical simplistic approach to the need to wait for another thread is to make use of operating system primitives, exposed through interfaces such as pthread's condition variables, to allow a thread to wait (using a wait call) until another thread has signaled it to wake up (using a signal call). This is perfectly reasonable in cases with low performance requirements. Nevertheless, at higher throughputs, the following problems are exhibited:                Both wait and signal require invocations into the operating system (OS) scheduler, paying the cost of a system call, the cost of any scheduler operations, and because the OS scheduler is typically a single entity in a system, scalability problems as many threads calling into the scheduler will be serialized.        When one thread signals another thread, both threads will typically contend on mutexes or atomic compare and swap operations, leading to cache misses and CPU stalls.        
Much higher throughput can be achieved if, instead of immediately executing the wait/signal synchronization primitives, threads which need to wait for a resource simply put off checking the resource that they are waiting on, so as to examine the resource after a short delay. If a thread is not blocked on a wait, then the thread making the resource available does not have to signal the consuming thread, provided that it is known that the thread waiting on the resource will examine the resource in a short time. Again, the simplistic approach is to use operating system primitives such as a sleep call, but these suffer from the same scheduler interaction problems as wait does, namely, there is a relatively large cost in interacting with the OS scheduler. Furthermore, sleeps are typically of a much coarser resolution than is desired (on the order of milliseconds to tens or hundreds of milliseconds), which causes very large latencies and much larger than desired batches (which further can cause cache misses or even can cause a process to swap out to disk, adversely affecting performance).
Instead of using a “wait” call or a “sleep” call to the OS scheduler, a common technique is for a thread to execute a loop a certain number of times, for the purpose of delaying further useful execution of the thread before examining the resource. This is referred to as a “spin loop.” The spin loop takes a number of CPU clock cycles to complete but does not otherwise do anything useful. By delaying to examine the resource under contention, contention is reduced. Furthermore, by not calling the OS scheduler, the disadvantages of interacting with the scheduler are avoided.
The optimal spin time for a spin loop is such that it is short enough to be comparable to the time it would have taken to use wait and signal (taking into account the cost of both the overhead of the wait and the signal), but large enough to reduce contention on the resource being waited on. Otherwise, CPU cache misses are provoked on both the consumer thread and, more critically, the producing thread. After a spin loop has completed, if there is still no way for that thread to make progress (for example, the mutex it is waiting to lock is still held, or no work has been added to a queue), then it may be desirable to fall back to the OS scheduler-driven wait and signal technique. Although the overhead using the OS scheduler is greater than the spin loop, it is undesirable to have threads simply spinning in spin loops for long durations which do not achieve useful work, because this can prevent other threads from getting useful work done, and it wastes energy, thereby increasing running costs. Thus, a spin loop which is too short can degrade to the simplistic wait and signal approach, with the attendant performance problems.
These and other known solutions are examples of standard techniques for mutually coordinating between threads which can be synchronized with each other and/or can share resources, for example.
Nevertheless, the conventional spin loop technique, and the wait/signal operations, do not overcome drawbacks and can exhibit problems in operation in real world scenarios.