Cluster computing typically includes Computer Interconnect Systems that provide functionality for Remote Direct Memory Access (RDMA). In order to provide reliable, remote memory operations, entities of the cluster may employ a reliable transfer protocol. The reliable transfer protocol may be implemented in software, firmware, hardware, or any combination thereof.
Some memory operations, such as a remote memory read, are idempotent, which means that such memory operations may be executed more than one time and still achieve the same result. Other memory operations are atomic, which means that such memory operations must execute exactly one time. In order to reliably execute atomic operations, the responder of the reliable transfer protocol should maintain a certain state associated with prior executed atomic operations. This state includes information required to re-generate the result of the operation sent back to the requester. If a request packet is lost or damaged in transit from the requester to the responder, the responder may either discard the packet because it is damaged or not receive the packet at all. In either case, the responder typically does not send a response to the requester. Rather, the reliable transfer protocol on the requester system will time out and re-send the packet. The re-sent packet will eventually be received by the responder, and the responder may then process the atomic operation, preserve the required state, and send a response to the responder with the result of the atomic operation.
If the response is damaged or lost, the requester will not receive a good response, and the reliable transfer protocol on the requester system will time out and re-send the packet. When this request packet is processed by the responder, the responder may detect that it is a duplicate request and use the already saved information to re-generate the correct result of the previously executed atomic operation.