Technical Field
Embodiments described herein relate to processors. In particular, embodiments described herein generally relate to processors that are operable to perform an instruction that monitors for a write to an address.
Background Information
Advances in semiconductor processing and logic design have permitted an increase in the amount of logic that may be included in processors and other integrated circuit devices. As a result, many processors now have multiple to many cores that are monolithically integrated on a single integrated circuit or die. The multiple cores generally help to allow multiple software threads or other workloads to be performed concurrently, which generally helps to increase execution throughput.
One challenge in such multiple core processors is that greater demands are often placed on caches that are used to cache data and/or instructions from memory. For one thing, there tends to be an ever increasing demand for higher interconnect bandwidth to access data in such caches. One technique to help increase the interconnect bandwidth to caches involves using a distributed cache. The distributed cache may include multiple physically separate or distributed cache slices or other cache portions. Such a distributed cache may allow parallel access to the different distributed portions of the cache through a shared interconnect.
Another challenge in such multiple core processors is an ability to provide thread synchronization with respect to shared memory. Operating systems commonly implement idle loops to handle thread synchronization with respect to shared memory. For example, there may be several busy loops that use a set of memory locations. A first thread may wait in a loop and poll a corresponding memory location. By way of example, the memory location may represent a work queue of the first thread, and the first thread may poll the work queue to determine if there is available work to perform. In a shared memory configuration, exits from the busy loop often occur due to a state change associated with the memory location. These state changes are commonly triggered by writes to the memory location by another component (e.g., another thread or core). For example, another thread or core may write to the work queue at the memory location to provide work to be performed by the first thread.
Certain processors (e.g., those available from Intel Corporation, of Santa Clara, Calif.), are able to use MONITOR and MWAIT instructions to achieve thread synchronization with respect to shared memory. A hardware thread or other logical processor may use the MONITOR instruction to set up a linear address range to be monitored by a monitor unit, and arm or activate the monitor unit. The address may be provided through a general purpose register. The address range is generally of write-back caching type. The monitor unit will monitor and detect stores/writes to an address within the address range, which will trigger the monitor unit.
The MWAIT instruction may follow the MONITOR instruction in program order, and may serve as a hint to allow the hardware thread or other logical processor to stop instruction execution, and enter an implementation-dependent state. For example, the logical processor may enter a reduce power consumption state. The logical processor may remain in that state until detection of one of a set of qualifying events associated with the MONITOR instruction. A write/store to an address in the address range armed by the preceding MONITOR instruction is one such qualifying event. In such cases, the logical processor may exit the state and resume execution with the instruction following the MWAIT instruction in program order.