1. Field of the Invention
The present invention relates, in general, to the field of computers and computing systems. More particularly, the present invention relates to a system and method for implementing a streamlined, low-level, user-mode, data transport mechanism for communicating messages between processes running on different nodes of a computer system cluster. In a representative embodiment disclosed herein, the processes may be implemented in conjunction with a switch/network adapter port (“SNAP™”, a trademark of SRC Computers, Inc.) and is denominated a SNAP explicit communication facility (“SNAPCF™”, also a trademark of SRC Computers, Inc.)
2. Relevant Background
In some instances, it is desirable to construct clustered computing systems out of nodes, with each node comprising some number of processors with a locally shared memory address space, such that it is possible for an application spanning a large number of nodes to participate in global communication. For performance reasons, it is important that this communication be performed with no per-transfer operating system involvement, that the end-to-end message latency be minimized and that the mechanism makes efficient use of available link bandwidth. For overall usability, it is necessary that the mechanism provide a means by which the source (or “send”) side of each communication can correctly address the target memory at the destination (i.e “naming”) in a manner which allows the operating system to maintain protected memory access.
Typically, communication in a clustered system takes place through a network interface or input/output (“I/O”) device. This has traditionally required operating system intervention in order to send messages. More recently, some network interface cards are being designed to support user-level communication using operating system (“OS”) bypass interfaces such as Myrinet GM (a high-performance, packet-communication and switching technology that is widely used to interconnect clusters of workstations, PCs, servers, or single-board computers), Virtual Interface Architecture (“VIA”), and Scheduled Transport (“ST”). Without exception, these OS-bypass capable interfaces have been designed to allow packet-based communication between source and destination network interface cards (“NICs”) with no intermediate storage. As such, buffer storage on both the “send” and “receive” sides is limited and must be dynamically managed to allow for the fact that data associated with any connection might be received at any time.
It has also been generally assumed that a requirement for asynchronous communication support exists (i.e. that the system processors should not need to be actively involved in the transport of message data). As a result, existing OS-bypass implementations all employ complex, time-consuming schemes for managing address mapping/translation, as well as direct memory access (“DMA”) transport.