Multi-core processors, often called Chip Multiprocessors (CMPs), contain multiple processor cores which, in some embodiments may be connected to an on-die shared cache though a shared cache scheduler and coherence controller. Multi-core multi-processor systems are becoming increasingly common in commercial server systems because of their performance, scalability and modular design. The coherence controller and the shared cache may either be centralized or distributed among the cores. The shared cache may be designed as an inclusive cache to provide snoop filtering.
These multi-core multi-processor systems results in a system with a large number of concurrent threads executing in parallel. To enable a high performance parallel execution environment, an efficient implementation of thread synchronization primitives is needed. In particular, a need exists to implement monitor primitives when the processor employs an inclusive shared last level cache.