A shared memory system typically includes multiple processing nodes connected together by a communications medium (e.g., a bus, a network, etc.). Each processing node includes a processor and local memory. In general, a processor can access its local memory faster than non-local memory (i.e., the local memory of another processor). SMP (symmetric multiprocessing), ccNUMA (cache-coherent non-uniform memory access) and NUMA (non cache-coherent non-uniform memory access) are examples of conventional multiprocessor architectures which employ shared-memory schemes.
One conventional NUMA shared memory system uses a page-granular data-sharing approach which involves the nodes exchanging and modifying shared pages of data at the page-granular level. For example, suppose that the processor of a first node of the system wishes to modify data on a particular shared page residing in the local memory of a second node. To modify the page, the processor of the first node sends a message requesting the page to the second node through the communications medium of the system. The processor of the second node responds to the message by locking the page and sending the page to the first node through the communications medium. The processor of the first node then modifies the page and sends the modified page back to the second node through the communications medium. Upon receipt of the modified page, the processor of the second node releases the lock so that the page is available again.
It should be understood that, while the page is locked on behalf of the first node, other nodes of the system are unable to modify the page. If another node wishes to modify the same page, that other node must wait until the lock on the page is released (e.g., until the first node completes its modification of the page, returns the page to the second node, and relinquishes the lock) before that other node can have a turn at locking and modifying the page.
One conventional ccNUMA shared memory system uses a cache-line-granular data-sharing approach which involves the nodes exchanging and modifying shared cache lines of data at the cache-line-granular level. For example, suppose that the processor of a first node of the system wishes to modify data on a particular shared cache line residing in the local memory of a second node. To modify the cache line, hardware logic within the first node sends a message requesting the cache line to the second node through the communications medium of the system. The hardware logic within the second node responds to the message, sending the cache line to the first node through the communications medium. The hardware logic on all nodes maintains the cache coherency of all shared cache lines.