A shared memory system typically includes multiple processing nodes connected together by a communications medium (e.g., a bus, a network, etc.). Each processing node includes a processor and local memory. In general, a processor can access its local memory faster than non-local memory (i.e., the local memory of another processor). SMP (symmetric multiprocessing), ccNUMA (cache-coherent non-uniform memory access) and NUMA (non cache-coherent non-uniform memory access) are examples of conventional multiprocessor architectures which employ shared-memory schemes.
Applications that run on these shared memory systems typically deploy data structures within this shared memory to share access to the data with other application instances. Applications construct and employ their own locking mechanisms to prevent multiple application instances from concurrently accessing and modifying their shared data and thus destroying data integrity. Before accessing the shared data, the application would, in the traditional manner, first acquire the application-lock protecting access to the data, possibly waiting for the lock to be freed by some other application instance. After acquiring this application-lock, the application could then access the shared data.
By way of example, on one traditional NUMA system, an application running on a first node could have an application-lock located in a shared page residing in the local memory of a second node. As a side effect of the application requesting this remote application-lock, the NUMA system's coherency mechanism on the processor of the first node sends a message through the communications medium of the system to the second node, requesting the subsystem page-lock on the page containing the application-lock. The processor of the second node responds to the message by acquiring the subsystem page-lock on behalf of the first node and notifying the first node that the page is locked. The processor of the first node then sends a message to the second node requesting the locked page, and the second node responds by providing the locked page to the first node through the communications medium. The processor of the first node then attempts to acquire the application-lock within that page. Once the application-lock is acquired, the processor of the first node sends the newly modified page back to the second node through the communications medium.
Eventually, the application explicitly releases the application-lock in the traditional manner. Additionally, the program provides a second explicit unlock instruction to the locking subsystem directing the locking subsystem to release the page-lock. In response, the locking subsystem clears the central locking data structure, thus enabling other nodes to acquire the page-lock in a similar manner.
It should be understood that the nodes in the shared memory system employ a sophisticated locking subsystem to coordinate accesses among multiple nodes competing for access to the shared memory page. This locking subsystem, which is separate from other node subsystems such as the node's virtual memory (VM) subsystem and the application's locking logic, is an integral part of the of the shared memory coherence mechanism, and is page granular.
It should be further understood that, while the page is locked on behalf of the first node, only the first node has access to the page, and other nodes of the system are unable to modify the page. If another node wishes to modify the same page, that other node must wait until the page's lock is released (e.g., until the first node completes its modification of the page, returns the page to the second node, and relinquishes the page-lock).
Similarly, on one traditional ccNUMA system, an application running on a first node could have an application-lock located in a shared cache line residing in the local memory of a second node. As a side effect of the application requesting this application-lock, the cache coherency mechanism in the first and second nodes enable coherent access to the shared cache line, which moves the cache line from the second node to the first node through the communications medium of the system. The processor of the first node then attempts to acquire the application-lock within the cache line.
It should be understood that the nodes in a ccNUMA system employ a sophisticated cache coherence subsystem to coordinate accesses among multiple nodes competing for access to the shared memory cache line. This subsystem is separate from other node subsystems such as the node's virtual memory (VM) subsystem and the application's locking logic.
Eventually the application explicitly releases the application-lock in the traditional manner.