1. Field of the Invention
This invention relates to computer networks. More particularly, this invention relates to inter-process communication over computer networks.
2. Description of the Related Art
The meanings of certain acronyms and abbreviations used herein are given in Table 1.
TABLE 1Acronyms and AbbreviationsCPUCentral Processing UnitNAKNegative AcknowledgementNICNetwork Interface ControllerPCIePeripheral Component Interconnect ExpressRDMARemote Direct Memory AccessRMWRead-Modify-WriteRNRResource Not Ready
Despite many proposals for lock-free resource allocation, locks are still commonly used to synchronize between execution threads or processes accessing a shared resource (also known as a “protected region”). Generally speaking, a thread trying to access a shared resource is required to make sure that it is safe to do so. Checking for safety is done by observing the value of the lock variable. Software convention defines when the lock is free and access to the shared resource is safe.
After observing the value of the lock variable, and if the lock was free, the lock value is set to a value noting that the lock is taken. Reading and checking the lock content or value, and writing that it is taken, must happen in an atomic way to prevent race conditions where multiple threads try to acquire the lock concurrently.
Turning now to the drawings, Reference is initially made to FIG. 1, which is an event diagram 10 illustrating a method of lock access in accordance with the prior art. A computational thread, (initiator 12) wishing to access shared resources over the network sends a lock acquisition command, i.e., an atomic read-modify-write (RMW) lock command 14 (atomic compare-and-swap is an example) to a network interface controller, initiator NIC 16, that provides network access to the initiator 12.
The RMW lock command 14 can execute within initiator NIC 16 or can be transferred over a bus, e.g., a peripheral component interconnect express (PCIe) bus, and be executed by the central processing unit (CPU) of the initiator 12. In the example of FIG. 1, the initiator NIC 16 relays the RMW lock command 14 over a network to a target NIC 18, which executes the command on target memory 20 (arrows 22, 24), thereby establishing a lock on a region of the target memory 20. The result of the command execution is transmitted as atomic response 26 from the target NIC 18 back to the initiator 12 via the initiator NIC 16.
The initiator 12 waits for the network access to complete, evaluates the atomic response 26, and concludes that the protected region of the target memory 20 is available to it. The protected region is of course locked against other processes. The initiator 12 then proceeds to access the protected region of the target memory 20 by issuing at least one RDMA access request 28, which is relayed via the NICs 16, 18 and reach the target memory 20 as access request 30. Once the access operation in the protected region is complete, the initiator 12 releases the lock by writing a new value into it as RDMA access request 32, which is transmitted and executed as RDMA write operation 34.
Reference is now made to FIG. 2, which is an event diagram 36 illustrating the method of lock access shown in FIG. 1 in which the requested resource is not immediately available, in accordance with the prior art. After RMW lock command 14 and the read request (arrow 22) are issued, the write request to establish a lock cannot be fulfilled as the resource is already locked. This situation is reported in atomic response 38. The initiator 12 then makes a second attempt to acquire the lock, by issuing another instance of RMW lock command 14, which now succeeds. However, in general several attempts may be necessary before RMW lock command 14 ultimately succeeds, after which the events proceed in the manner described above with respect to FIG. 1. The details are not repeated in the interest of brevity.
The synchronization management system represented by the event diagram 36 is sensitive to lock contention, and the above described operations can incur considerable overhead. In the case of remote transactions, there is at least one round trip over the network to make sure that the lock is actually taken, and the CPU is busy managing the lock and cannot do other tasks.