The Internet Protocol (“IP”) is a network data transfer protocol that serves as the foundation of almost all Internet communications. Other protocols (for example, Apple Computer, Incorporated's Appletalk™ and Novell Incorporated's NetWare™) serve some of the same needs. Network protocols are used to transfer data from system to system over a variety of wired and wireless physical media such as Ethernet™, Token Ring, Wi-Fi®, and InfiniBand®. Systems that are to exchange data must have hardware such as a network interface card (“NIC”) to interface to the physical media, driver software to control the NIC, software or hardware to implement the network protocol, and software to produce or receive the data to be transferred.
Many network protocols were designed to provide high throughput, or the ability to transfer large amounts of information quickly. However, in numerous networked applications, performance depends not on throughput but latency, or the time between transmission and receipt of data. For example, a server might perform calculations for a client, where the input and output of the calculations are small amounts of data. If the server notifies the client when a computation is finished and the client sends a new unit of work, then delays between the server notifying the client and the client responding with new work represent wasted time during which the server could be performing useful services. It turns out that, in situations where cooperating processes that communicate over a network send queries and replies in a “ping-pong” fashion, delays in delivering data from the network to the application can consume a significant fraction of the applications' run times, and that reductions in latency can provide large performance benefits.
Some approaches to reducing network latency involve changing (or replacing) network protocols or performing protocol processing in hardware, both of which have serious drawbacks. For example, iWARP (Internet Wide Area RDMA Protocol, where RDMA stands for Remote Direct Memory Access) achieves some latency reduction but requires applications to be modified to use the protocol, while RNICs (RDMA-capable Network Interfaces) are complex and expensive hardware devices. Simpler, lower-cost alternatives to reduce network latency without requiring software redesign, that can make better use of inexpensive network interfaces such as stateless Ethernet controllers may be of significant value.
Cooperating applications that use network protocols to communicate often use a generic interface provided by an operating system to perform the lower-level tasks involved in transmitting and receiving data over a network. For example, an application may use a “read” subroutine to obtain data from another system on a network, or a “write” subroutine to transmit data to another system. The subroutines may be provided as shared object files, shared or static libraries, or similar formats such as dynamic link libraries (“DLLs”). These formats often permit the software implementing the lower-level network tasks to be corrected or upgraded without affecting the applications themselves. For example, a shared library providing an improved “read” subroutine could be installed on a system, and any applications that used the library would benefit from the improved subroutine.
Many computer systems are controlled by an operating system (“OS”) that can create the illusion that the system is performing several tasks simultaneously. In fact, only one task is executing at a time (in a system with multiple central processing units (“CPUs”), one task may be executing on each CPU) and the operating system switches between tasks many times per second, allowing each task to execute for a period of time called a “time slice.” Switching from one task to another (a “context switch”) is a relatively expensive event because the operating system must save the current state of one task and load the previously-saved state of the next task, and the execution of the new task may proceed slowly if, for example, the CPU needs to load its cache memories with instructions or data used by the new task. Nevertheless, some application programs are designed to relinquish the processor intentionally (“block”) when they must wait for an event to occur, thus giving up the remainder of their time slice, instead of repeatedly checking whether the event had occurred in a “busy-waiting,” polling loop.
At a lower level, logically between the operating system and the underlying hardware, computer systems and their component subsystems use signals called interrupts to trigger certain processing sequences. For example, a NIC that has received a packet may issue an interrupt to ensure that the packet is dealt with quickly. An interrupt causes a CPU to suspend its current operations and to execute an interrupt service routine (“ISR”), which can perform any time-sensitive actions that must occur immediately, and can arrange for other actions to happen within the operating system's normal task scheduling system. Interrupts may be less expensive than a full context switch, but can nevertheless consume a significant amount of processing time, particularly if they are issued at a high rate.