Modern high-performance microprocessors can have a number of execution cores and multiple levels of cache storage. Thus there is an ever increasing demand for higher interconnect bandwidth between these components. One technique to provide such higher interconnect bandwidths involves distributed cache partitioning with parallel access to multiple portions of the distributed cache through a shared interconnect.
Another aspect of some modern high-performance microprocessors includes multithreaded software and hardware, and thread synchronization through shared memory. An example of two instructions to provide thread synchronization through shared memory would be the MONITOR and the MWAIT instructions of Intel Corporation's SSE3 instruction set. MONITOR defines an address range used to monitor write-back stores. MWAIT is used to indicate that an execution thread is waiting for data to be written to the address range defined by the MONITOR instruction. The thread can then transition into a low power state and wait to be notified by a monitor-wake event when data is written to the monitored address range.
When the two above mentioned techniques are used in combination with each other, additional challenges present themselves. For example, centralized tracking of all monitor requests for all of the active execution threads while permitting parallel access to multiple portions of the distributed cache may introduce bottlenecks and adversely impact the performance of distributed cache access through the shared interconnect. To date, efficient techniques for implementing thread synchronization through MONITOR and MWAIT instructions in a distributed cache architecture have not been fully explored.