1. Field
The described embodiments relate to computing devices. More specifically, the described embodiments relate to a conditional notification mechanism for computing devices.
2. Related Art
Many modern computing devices include two or more hardware contexts such as two or more separate hardware thread contexts in central processing units (CPU) or a graphics processing unit (GPU) and/or two or more CPU or GPU processor cores. In some cases, two or more hardware contexts in a computing device need communicate with one another to determine if a given event has occurred. For example, a first CPU processor core may reach a synchronization point at which the first CPU processor core communicates with a second CPU processor core to determine if the second CPU processor core has reached a corresponding synchronization point. Several techniques have been proposed to enable hardware contexts to communicate with one another to determine if a given event has occurred, as described below.
A first technique for communicating between hardware contexts is a “polling” technique in which a first hardware context, until a value in a shared memory location meets a condition, reads the shared memory location and determines if the shared memory location meets the condition. For this technique, a second (and perhaps third, fourth, etc.) hardware context updates the shared memory location when a designated event has occurred (e.g., when the second hardware context has reached a synchronization point). This technique is inefficient in terms of power consumption because the first hardware context is obligated to fetch and execute instructions for performing the reading and determining operations. Additionally, this technique is inefficient in terms of cache traffic because the reading of the shared memory location can require invalidation of a cached copy of the shared memory location. Moreover, this technique is inefficient because the polling hardware context is using computational resources that could be used for performing other computational operations.
A second technique for communicating between hardware contexts is an interrupt scheme, in which an interrupt is triggered by a first hardware context in order to communicate with a second (and perhaps third, fourth, etc.) hardware context. This technique is inefficient because processing interrupts in the computing device requires numerous operations be performed. For example, in some computing devices, it is necessary to flush instructions from one or more pipelines and save state before an interrupt handler can process the interrupt. In addition, in some computing devices, processing an interrupt requires communicating the interrupt to an operating system on the computing device for prioritization and may require invoking scheduling mechanisms (e.g., a thread scheduler, etc.).
A third technique for communicating between hardware contexts is the use of instructions such as the MONITOR and MWAIT instructions. For this technique, a first hardware context executes the MONITOR instruction to configure a cache coherency mechanism in the computing device to monitor for updates to a designated memory location. Upon then executing the MWAIT instruction, the first hardware context signals the coherency mechanism (and the computing device generally) that it is transitioning to a wait (idle) state until an update (e.g., a write) is made to the memory location. When a second hardware context updates the memory location by writing to the memory location, the coherency mechanism recognizes that the update has occurred and forwards a wake-up signal to the first hardware context, causing the first hardware context to exit the idle state. This technique is useful for simple cases where a single update is made to the memory location. However, when a value in the memory location is to meet a condition, the technique is inefficient. For example, assuming that the condition is that the memory location, which starts at a value of 0, is to be greater than 25, and that the second hardware context increases the value in the memory location by at least one each time an event occurs. In this case, the first hardware context may be obligated to execute the MONITOR/MWAIT instructions and conditional checking instructions as many as 26 times before the value in the memory location meets the condition.
A fourth technique for communicating between hardware contexts employs a user-level interrupt mechanism where a first hardware context specifies the address of a memory location (“flag”). When a second hardware context subsequently updates/sets the flag, the first hardware context is signaled to execute an interrupt handler. For this technique, much of the control for handling the communication between the hardware contexts is passed to software and thus to the programmer. Because software is used for handling the communication between the hardware contexts, technique is inefficient and error-prone.
As described above, the various techniques that have been proposed to enable hardware contexts to communicate with one another to determine if a given event has occurred are inefficient in one way or another.