A distributed computer system typically includes a number of interconnected nodes. Each nodes typically includes a processor and memory. In addition, the nodes typically include the necessary hardware and software to communicate with other nodes in the distributed system.
Nodes in the distributed computer system may be separated into a user level space (in which the user-level applications (such as internet browsers) operate) and a kernel space. The separation allows for stability and security in the system and for the user-level application to operate without requiring knowledge of the underlying hardware. For example, the kernel may send and receive a message from the network and process the message for the user-level application. Processing a message may involve translating the message from received signals from the network, determining the application receiving the message, coordinating the message passing with the sending node, etc.
When the user-level application wants to receive a message, the user-level application makes a request to the kernel (i.e., a system call) that may contain a socket descriptor, a pointer to a message structure for storing the message, the length of the message, any flags required when receiving the message, etc. The socket descriptor may identify a socket, which is often created prior to the receive request. A socket defines the endpoints of a connection. Specifically, when two user-level applications on two different nodes communicate, each of the user-level applications has a dedicated socket. As the sockets are dedicated, messages sent to the socket associated with a user-level application are only for the user-level application. The flags specify any special condition upon which to send the message.
Often, a user-level application has several messages to receive. These messages may be received from several clients or received from a single application. Each message requires a separate system call. Thus, the system call returns with each message individually to be consumed by the user-level application individually. With each system call, overhead is associated with the system call. This overhead may be derived from context switching, memory accesses, translation look-aside buffer (TLB) flushes, etc.
Often messages arrives at a greater rate than the user-level application can consume. When this occurs, messages are queued at the socket layer. The message queue at the socket layer may have a low water mark and a high water mark. The low water mark indicates that the user-level application has consumed enough messages and is ready to receive more messages. The high water mark indicates that the user-level application has messages arrive at the node at a rate faster than the user-level application is able to consume. In order to avoid unnecessary storage, when the number of messages reaches the high water mark, messages may be deleted. Specifically, rather than processing received messages for an application which has a message queue at the high water mark, the kernel may simply deletes the messages. Thus, the messages have to be resent from the sending node. Network bandwidth is wasted because messages are being sent and not processed. Therefore, throughput, or the amount of messages that are sent and processed at a time, decreases for all applications.