Technical Field
Embodiments described herein generally relate to processors. In particular, embodiments described herein generally relate to pausing threads in processors.
Background Information
Software multithreading is a technique that has been used to help improve processor performance. In software multithreading, code (e.g., an application) may be partitioned into multiple threads. Each thread may represent an instruction stream or sequence that is capable of being performed separately from the others and/or in parallel. As one simple example, one thread may handle images of a video stream, while another thread may handle audio of the video stream.
Different approaches are available in terms of processor hardware to implement multithreading. One approach is known as interleaved or temporal multithreading. One example of such an approach is time-slice multithreading or time-multiplex (TMUX) multithreading, in which a single physical processor (e.g., a core) switches between threads on alternating cycles, or after a fixed period of time. Another example of such an approach is switch-on-event multithreading (SoEMT), in which a single physical processor switches between threads upon occurrence of a trigger event, for example, a long latency cache miss, a page fault, other long latency events, or the like. In interleaved or temporal multithreading, generally only one thread of instructions may execute in a given pipeline stage at a time.
Another multithreading approach is known as simultaneous multithreading (SMT). In SMT, instructions from more than one thread may be executing concurrently in a given pipeline stage of a single physical processor (e.g., a core) at a given time. For example, a single core may be made to appear as multiple logical processors to software, with each logical processor performing a different thread. Some resources of the core may be dedicated to a given thread or logical processor. For example, commonly each thread or logical processor may maintain a complete set of the architecture state. Other resources of the core may be shared by two or more threads or logical processors. For example, depending upon the particular implementation, caches, execution units, branch predictors, decoders, other logic, or a combination thereof, may be shared by two or more threads executing in a single physical core.
One challenge in multithreading is efficient handling of spin-wait loops. Threads often need to share resources and/or synchronize with other threads. A spin-wait loop is a technique used in multithreaded applications where one thread waits on another thread(s), for example, to synchronize or gain access to a shared resource. The spin-wait loop may represent a routine or section of code where a thread is accessing a synchronization primitive (e.g., a shared lock, semaphore, or mutex) in a tight polling loop. For example, the logical processor may execute a load-compare-branch loop to compare the synchronization primitive repeatedly until it has a desired value. The logical processor is generally able to execute the spin-wait loop very quickly, which may consume a significant amount of power and execution resources. However, executing the spin-wait loop rapidly generally does not improve performance.
A PAUSE or spin-loop hint instruction is described in Intel® 64 and IA-32 Architectures Software Developer's Manual, Order Number: 325462-049US, February 2014. The PAUSE instruction is reported to improve the performance of spin-wait loops. It is also reported that an additional function of the PAUSE instruction is to reduce the amount of power consumed by a processor while executing a spin-wait loop.
U.S. Pat. No. 6,671,795 describes a method and apparatus for pausing execution in a processor. It is disclosed in part that a pause instruction may be used to pause execution of one thread in order to give preference to another thread or to save power.