Signaling between threads, whether on the same processing device or on different processing devices, often is achieved via the modification of a value stored at a predetermined memory address of a memory accessible to the threads. One common implementation includes software-based polling whereby one thread enters a software polling loop to wait until a predetermined value is stored at the memory address by another thread. This polling loop typically entails repeated memory accesses to reload the value stored at the memory address at the time of the memory access to determine whether it has been modified. These frequent memory accesses increase traffic on the memory bus and therefore can limit the overall bandwidth of the memory for other processes. Further, a processing device is engaged in repeatedly executing the instructions representing the polling loop, thereby limiting the processing bandwidth available to other threads associated with the processing device. Accordingly, an improved technique for polling-based signaling between threads would be advantageous.