Message Passing Interface (MPI) is a communication protocol that is widely used for exchange of messages among processes in high-performance computing (HPC) systems. Messages sent from a sending process to a destination process are marked with an identifying label, referred to as a tag. Destination processes post buffers in local memory that are similarly marked with tags. When a message is received by the receiver (i.e., the host computer on which the destination process is running), the message is stored in a buffer whose tag matches the message tag. The process of finding a buffer with a matching tag for the received packet is called tag matching.
There are two protocols that are generally used to send messages over MPI: The “Eager Protocol” is best suited to small messages that are simply sent to the destination process and received in an appropriate matching buffer. The “Rendezvous Protocol” is better suited to large messages. In Rendezvous, when the sender process has a large message to send, it first sends a small message to the destination process announcing its intention to send the large message. This small message is referred to as an RTS (ready to send) message. The RTS includes the message tag and buffer address in the sender. The destination process matches the RTS to a posted receive buffer, or posts such a buffer if one does not already exist. Once a matching receive buffer has been posted at the destination process side, the receiver initiates a remote direct memory access (RDMA) read request to read the data from the buffer address listed by the sender in the RTS message.
U.S. Pat. No. 8,249,072 describes an interface device for a compute node in a computer cluster, which performs MPI header matching using parallel matching units. The interface device comprises a memory, which stores posted receive queues and unexpected queues. The posted receive queues store receive requests from a process executing on the compute node. The unexpected queues store headers of send requests (e.g., from other compute nodes) that do not have a matching receive request in the posted receive queues. The interface device also comprises a plurality of hardware pipelined matcher units. The matcher units perform header matching to determine if a header in the send request matches any headers in any of the plurality of posted receive queues.