Modern high-performance microprocessors can have a number of execution cores and multiple levels of cache storage. Thus there is an ever increasing demand for higher interconnect bandwidth between these components. One technique to provide such higher interconnect bandwidths involves distributed cache partitioning with parallel access to multiple portions of the distributed cache through a shared interconnect.
Another aspect of some modern high-performance microprocessors includes multithreaded software and hardware, and thread synchronization through shared memory. An example of two instructions to provide thread synchronization through shared memory would be the MONITOR and the MWAIT instructions of Intel Corporation's SSE3 instruction set. MONITOR defines an address range used to monitor write-back stores. MWAIT is used to indicate that an execution thread is waiting for data to be written to the address range defined by the MONITOR instruction. The thread can then transition into a low power state and wait to be notified by a monitor-wake event when data is written to the monitored address range.
When the two above mentioned techniques are reserved for MONITOR and MWAIT requests at a privilege level of 0 (zero), additional challenges present themselves. For example, any thread synchronization and/or power management using MONITOR and MWAIT, which would involve system calls, may introduce bottlenecks and adversely impact the performance of the thread synchronization and/or power management.
To date, efficient techniques for implementing thread synchronization through MONITOR and MWAIT instructions that address these challenges, and potential solutions to such performance limiting issues, as well as design, validation and other complexities have not been adequately explored.