This invention relates to methods for managing I/O event notifications in a data processing system and to methods for minimising the latency and jitter associated with the management of I/O event notifications.
Typically, input and output (I/O) operations in an operating system will be handled using file descriptors, which are abstract indicators that can provide a reference to the files, sockets, FIFOs, block or character devices, or other I/O resources maintained by the operating system. Processes running on the operating system use the file descriptors in calls to the operating system in order to reference the files, sockets etc. of the system, with the operating system maintaining a kernel-resident data structure containing the correspondence between each file descriptor and resource (file, socket etc.) of the system. This allows, for example, an application to read from a particular file by means of a read( ) system call to the operating system that includes the file descriptor associated with that file. The operating system looks up the file corresponding to the provided file descriptor and, if the necessary permissions are satisfied, performs the requested read operation on behalf of the application.
An operating system generally provides several mechanisms for managing the file descriptors that are used for I/O. For example, in Linux, the select, poll and epoll mechanisms are provided as part of the system call API, each of which allows a process to monitor sets of file descriptors for an event such as a file descriptor becoming ready to perform I/O, or data being updated at the memory location identified by the file descriptor. The sets of file descriptors that an application wishes to monitor are typically held in objects that can be managed through the mechanisms provided by the operating system, with each object holding a set of file descriptors relating to the application. For example, in Linux an application can establish an epoll instance for handling a set of file descriptors that the application wishes to monitor, the epoll instance being managed by means of epoll_create( ), epoll_ctl( ) and epoll_wait( ) system calls provided by the epoll mechanism.
Each of the mechanisms will typically have different performance characteristics. For example, the epoll_wait( ) system call is more efficient than the equivalent select( ) and poll( ) calls when the set of file descriptors is large. However, the epoll mechanism is relatively inefficient when the set of monitored file descriptors and I/O events changes because this additionally requires calls to epoll_ctl( ). Since the calls to a kernel mechanism such as epoll are system calls, they require a context switch from the application making the call into the kernel and are therefore relatively expensive in terms of processing overhead. Each context switch consumes processing resources and can introduce unwanted latency into the system. Epoll is typically used as follows:
/* Create an epoll set. */epoll_set = epoll_create(...);/* Add file descriptor(s) to the set. */epoll_ctl(epoll_set, EPOLL_CTL_ADD, fd, event);/* Wait for file descriptors in set to become ready. */n_events = epoll_wait(epoll_set, events, maxevents, timeout);/* Use the file descriptors that are ready. */for( i = 0; i < n; ++i )...
In contrast, with the select or poll mechanisms, the set of file descriptors to monitor is supplied in a single select( ) or poll( ) call that also waits for the file descriptors to become ready. For example, the above epoll use can be achieved as follows with poll:
/* Add file descriptor(s) to a set. */pfds[n_fds].fd = fd;pfds[n_fds].events = POLLIN;++n_fds;/* Wait for file descriptors in set to become ready. */n_events = poll(pfds, n_fds, timeout);/* Use the file descriptors that are ready. */for( i = 0; i < n; ++i )...
Thus, the poll and select mechanisms tend to be more efficient when the set of file descriptors to be monitored changes frequently over time, and the epoll mechanism tends to be more efficient when the set of file descriptors is relatively large and remains relatively static.
Another problem with conventional I/O event notification mechanisms is that they can introduce significant latency and jitter into the processing performed by the threads of an application due to the blocking of I/O event notification threads that are waiting for events at the descriptors monitored by the application. This is of particular concern in data processing systems that have user-level network stacks operating over high bandwidth network interface devices. In order to provide a low latency, high speed data path between a user-level stack and its network interface device, it is generally important to minimise the latency and jitter experienced by the stack due to kernel processes.
There is therefore a need for an I/O event notification mechanism that can be efficiently used with large sets of file descriptors when the set of file descriptors changes frequently. There is also a need for improved mechanisms for invoking system calls so as to minimise the latency and jitter associated with the management of I/O event notifications.