The present invention relates to a method and system for processing data, and in particular it relates to processing data in accordance with a data transfer protocol.
FIG. 1 represents elements of a computer system capable of implementing a conventional protocol stack, such as a transmission control protocol (TCP) stack in a computer connected to a network. The computer system includes an application 101, a socket 102 and an operating system 103 incorporating a kernel 104. A network interface such as a network interface card (NIC) 106 is provided for interfacing between the computer system and the network. The socket 102 connects the application to a remote entity by means of a network protocol, in this example TCP/IP. The application can send and receive TCP/IP messages by opening a socket and reading and writing data to and from the socket, and the operating system causes the messages to be transported across the network via the NIC. One socket is typically provided for each network endpoint with which an application wishes to communicate. The application can invoke a system call (syscall) for transmission of data onto the network. Syscalls can be thought of as functions taking a series of arguments which cause execution of the CPU to switch to a privileged level and start executing the operating system. A given syscall will be composed of a specific list of arguments, and the combination of arguments will vary depending on the type of syscall.
Syscalls made by applications in a computer system can indicate a file descriptor (sometimes called a handle), which is usually an integer number that identifies an open file within a process. A file descriptor is obtained each time a file is opened or a socket or other resource is created. File descriptors can be re-used within a computer system, but at any given time a descriptor uniquely identifies an open file or other resource within the context of a process. Thus, when a resource (such as a file) is closed down, the descriptor will be destroyed, and when another resource is subsequently opened the descriptor can be re-used to identify the new resource. Any operations which for example read from, write to or close the resource take the corresponding file descriptor as an input parameter. A system call when invoked causes the operating system to execute algorithms which are specific to the file descriptor identified in the syscall.
In the context of networking, syscalls are used by applications to invoke a stack to send data, and to consume data that has been received, optionally blocking until more data arrives. In this context, a stack is a set of software and/or hardware resources that implement a collection of sockets. Other system calls are used for control plane operations such as creating and destroying sockets, connecting to remote endpoints, and querying the state of sockets.
In a typical network arrangement packets arriving at a NIC are delivered into buffers in host memory, and a notification is sent, in the form of a communication to the NIC's device driver in the operating system kernel. The communication channel by which this communication is delivered typically consists of a queue of notifications that may include notifications of other types of events, including successful transmission of outgoing packets. This communication channel is referred to in the following description as an event queue.
When network events are occurring in the computer system, at some point the device driver must process the event queue by inspecting each event notification and processing the received packets. It is desirable that this happen promptly, since undue delay in the processing of received packets may delay the progress of applications, or may cause the link to go idle. In conventional systems the processing of the event queue is invoked by way of an interrupt generated by the NIC at the time that the event is delivered to the event queue.
An interrupt causes the CPU to save the state of whatever process is currently running, and switch control to an interrupt service routine. This routine processes the event queue, and carries out network processing for the received packets. Thus network processing is carried out in a timely manner and at high priority in response to packet arrival.
A disadvantage of this mechanism is that interrupts incur high overhead due to the requirement to save and subsequently restore the state of the running process, and to interact with the hardware, and due to the impact on the memory caches in the processor.
It is widely known that performance can be improved by reducing the rate at which interrupts are invoked. One means by which this can be achieved is interrupt moderation, which imposes a minimum time gap between each interrupt. This may delay the processing of received packets slightly, but it means that the overheads of an interrupt are effectively spread over a larger number of event notifications.
Another means to reduce overheads due to interrupts is “Lazy Receiver Processing”, discussed at http: (slash) (slash) www.cs.rice.edu/CS/Systems/LRP/final.html in an article entitled Lazy Receiver Processing: A Network Subsystem Architecture for Server Systems by Peter Druschel and Gaurav Banga. In this model interrupts are not enabled by default. Instead any outstanding event notifications in the event queue are processed when the stack is invoked by the application via a system call. Thus received packets are processed promptly provided the application invokes the stack frequently. When the application is blocked waiting to send or receive on a socket it is not available to process the event queue. To ensure that events will still be handled at this time, interrupts are enabled and the event queue is processed in the conventional way.
A problem with the Lazy Receiver Processing scheme is that if the process does not invoke the stack frequently, and is not blocked waiting for a socket, then the event queue may not get processed in a timely fashion. This can be resolved by providing a kernel thread that is able process the event queue from time-to-time as necessary, as described in the applicant's co-pending PCT application no. PCT/GB06/002202. However, this mechanism may not always be capable of implementation in a way that is both efficient and timely, partly because it involves the use of an additional thread competing with applications for CPU time.