A distributed computer system typically includes a number of interconnected nodes. Each nodes typically includes a processor and memory. In addition, the nodes also include the necessary network hardware and software to communicate with other nodes in the distributed system. The distributed computer system may also contain one or more routers that pass messages between nodes. Often, a message from the sending node passes through one or more routers before the message arrives at the receiving node.
Nodes in the networked computer system may also have a user level space in which the user-level applications (such as internet browsers) operate and a kernel space. The separation allows for stability and security in the system and for the user-level application to operate without requiring knowledge of the underlying hardware. For example, the kernel is responsible for allocating memory in hardware for a user-level application to use when executing. When the user-level application requests memory, rather than simply returning to the user the physical address (i.e., disk, block, sector, etc.) the kernel creates a virtual address space for the user-level application, thereby isolating the application from the hardware.
The virtual address space allows for the user-level application to execute in what appears to be a continuous block of memory without needing to have knowledge of hardware or even the other applications running simultaneously. In order to provide such functionality, the kernel has a virtual memory system. The virtual memory system employs a Central Processing Unit (CPU) Memory Management Unit (MMU) to enforce the management of virtual memory. The MMU is responsible for allocating the memory and performing the translations from the virtual address to the physical address. Not only does the user-level application use virtual addressing in order to execute, but the kernel also uses virtual addresses when executing. Thus, the MMU not only provides translations for the user-level application, but also for the kernel-level application.
Another function that the kernel provides through the MMU is direct memory access (DMA). With DMA, a user-level application may store or retrieve information from memory with minimal use of processor. The processor is only required for setting up the transfer and notifying when the transfer is complete.
When the user-level application sends a message on a node, the user-level application makes a request to the kernel (e.g., a system call) containing a socket descriptor, a pointer to a message structure, the length of the message, and any flags required when sending the message. The socket descriptor identifies a socket which is often created prior to the send request. A socket defines the endpoints of a connection. Specifically, when two applications are on two different nodes and are connected on a network, the socket provides a dedicated address whereby messages sent to and received from the socket belong to the application. The message structure contains the message itself. The flags specify any special condition upon which to send the message.
When a user sends a pointer to the message structure through the send request, the kernel makes a copy of the message. A copy is made because the kernel memory and user memory are in separate virtual spaces in an attempt to provide security. The copying is typically performed through DMA (as described above). Further, the virtual address is mapped to a physical input/output (I/O) address to send the message using the network interface. An I/O MMU is responsible for performing the mapping from the virtual address to the physical I/O address.
Once the mapping is performed, the kernel is responsible for encoding the message (including encrypting), providing error checking, and resending the message (if necessary), etc. The kernel then sends each packet of the message to the receiving node. When the packet arrives, the kernel of the receiving node places the packet in a message buffer of the appropriate application.
One mechanism for performing the protocols for sending the message is to offload the message onto a Network Interface Card (NIC). The NIC may then be used to perform the protocols for sending the message. In order to offload the messages, DMA may be employed to transfer the messages onto the NIC.
Often, a user-level application sends several messages at one time. These messages may be the same message to several clients, such as a server broadcasting packets in a to several clients in an online chat room, or the server sending different messages to different clients. In order to send the messages, the user-level application creates a separate system call requesting to send the message for each receiving node. With each system call, the processor is used to perform the DMA and the I/O MMU performs the mapping (as described above).
In order to avoid the overhead of creating several system calls, an Internet Protocol (IP) multicast group address may be created. In such a scheme, the sending node specifies the IP multicast group address to send the message. The receiving nodes request membership to the group using the group address. Upon receiving a message with the group address, any router with a list of the group members reroutes the message to each member. This scheme requires that the messages sent contain the same data to be sent, and the routers having the capabilities of identifying the members of the groups and forwarding the messages to those members.