1. Field of the Invention
The invention relates to network interfaces, and more particularly to mechanisms for validating network traffic sent or received by user level libraries in a virtual network architecture.
2. Description of Related Art
A typical computer system includes a processor subsystem (including one or more processors), a memory subsystem (including main memory, cache memory, etc.), and a variety of “peripheral devices” connected to the processor subsystem via a peripheral bus. Peripheral devices may include, for example, keyboard, mouse and display adapters, disk drives and CD-ROM drives, network interface devices, and so on. The processor subsystem communicates with the peripheral devices by reading and writing commands and information to specific addresses that have been preassigned to the devices. The addresses may be preassigned regions of a main memory address space, an I/O address space, or another kind of configuration space. Communication with peripheral devices can also take place via direct memory access (DMA), in which the peripheral devices (or another agent on the peripheral bus) transfers data directly between the memory subsystem and one of the preassigned regions of address space assigned to the peripheral devices.
Most modern computer systems are multitasking, meaning they allow multiple different application programs to execute concurrently on the same processor subsystem. Most modern computer systems also run an operating system which, among other things, allocates time on the processor subsystem for executing the code of each of the different application programs. One difficulty that might arise in a multitasking system is that different application programs may wish to control the same peripheral device at the same time. In order to prevent such conflicts, another job of the operating system is to coordinate control of the peripheral devices. In particular, only the operating system can access the peripheral devices directly; application programs that wish to access a peripheral devices must do so by calling routines in the operating system. The placement of exclusive control of the peripheral devices in the operating system also helps to modularize the system, obviating the need for each separate application program to implement its own software code for controlling the hardware.
The placement of exclusive control of the peripheral devices in the operating system also permits management of another potential difficulty, that of improper control or handling of the peripheral device. For network interface devices, for example, improper or inappropriate control of the devices could compromise other applications running in the computer system, or could compromise or otherwise negatively impact operation of the network to which the device is connected. In established operating systems, much of the software code for controlling these devices has evolved over a number of years and has been updated and improved in response to numerous tests by numerous people on numerous types of network interface devices. The software code in the operating system has therefore developed a certain level of trust: users, network administrators, network architects and other network devices can presume that the great majority of packets originating from this software code will conform to network protocol specifications. Additional code for controlling each particular peripheral device is incorporated into the operating system in the form of a device driver specific to the particular peripheral device. Device drivers are usually written by or in association with the manufacturer of the particular peripheral device, so they too are afforded a certain level of trust.
The part of the operating system that controls the hardware is usually the kernel. Typically it is the kernel which performs hardware initializations, setting and resetting the processor state, adjusting the processor internal clock, initializing the network interface device, and other direct accesses of the hardware. The kernel executes in kernel mode, also sometimes called trusted mode or a privileged mode, whereas application level processes execute in a user mode. Typically it is the processor subsystem hardware itself which ensures that only trusted code, such as the kernel code, can access the hardware directly. The processor enforces this in at least two ways: certain sensitive instructions will not be executed by the processor unless the current privilege level is high enough, and the processor will not allow user level processes to access memory locations (including memory mapped addresses associated with specific hardware resources) which are outside of a user-level physical or virtual address space already allocated to the process. As used herein, the term “kernel space” or “kernel address space” refers to the address and code space of the executing kernel. This includes kernel data structures and functions internal to the kernel. The kernel can access the memory of user processes as well, but “kernel space” generally means the memory (including code and data) that is private to the kernel and not accessible by any user process. The term “user space”, or “user address space”, refers to the address and code space allocated by a code that is loaded from an executable and is available to a user process, excluding kernel private code data structures. As used herein, all four terms are intended to accommodate the possibility of an intervening mapping between the software program's view of its own address space and the physical memory locations to which it corresponds. Typically the software program's view of its address space is contiguous, whereas the corresponding physical address space may be discontiguous and out-of-order, and even potentially partly on a swap device such as a hard disk drive. Address spaces are sometimes referred to herein as “virtual” address spaces, in order to emphasize the possibility of such mappings.
Although parts of the kernel may execute as separate ongoing kernel processes, much of the kernel is not actually a separate process running on the system. Instead it can be thought of as a set of routines, to some of which the user processes have access. A user process can call a kernel routine by executing a system call, which is a function that causes the kernel to execute some code on behalf of the process. The “current process” is still the user process, but during system calls it is executing “inside of the kernel”, and therefore has access to kernel address space and can execute in a privileged mode. Kernel code is also executed in response to an interrupt issued by a hardware device, since the interrupt handler is found within the kernel. The kernel also, in its role as process scheduler, switches control between processes rapidly using the clock interrupt (and other means) to trigger a switch from one process to another. Each time a kernel routine is called, the current privilege level increases to kernel mode in order to allow the routine to access the hardware directly. When the kernel relinquishes control back to a user process, the current privilege level returns to that of the user process.
When a user level process desires to communicate with the NIC, conventionally it can do so only through calls to the operating system. The operating system implements a system level protocol processing stack which performs protocol processing on behalf of the application, and also performs certain checks to make sure outgoing data packets have authorized characteristics and are not malformed. In particular, an application wishing to transmit a data packet using TCP/IP calls the operating system API (e.g. using a send( ) call) with data to be transmitted. This call causes a context switch to invoke kernel routines to copy the data into a kernel data buffer and perform TCP send processing. Here protocol is applied and fully formed TCP/IP packets are enqueued with the interface driver for transmission. Another context switch takes place when control is returned to the application program. Note that kernel routines for network protocol processing may be invoked also due to the passing of time. One example is the triggering of retransmission algorithms. Generally the operating system provides all OS modules with time and scheduling services (driven by the hardware clock interrupt), which enable the TCP stack to implement timers on a per-connection basis. The operating system performs context switches in order to handle such timer-triggered functions, and then again in order to return to the application.
It can be seen that network transmit and receive operations can involve excessive context switching, and this can cause significant overhead. The problem is especially severe in networking environments in which data packets are often short, causing the amount of required control work to be large as a percentage of the overall network processing work.
One solution that has been attempted in the past has been the creation of user level protocol processing stacks operating in parallel with those of the operating system. Such stacks can enable data transfers using standard protocols to be made without requiring data to traverse the kernel stack. In one implementation, TCP and other protocols are implemented twice: once built into the kernel and once built into a user level transport library accessible to application programs. In order to control and/or communicate with the network interface device an application issues API (application programming interface) calls. Some API calls may be handled by the user level transport libraries, and the remainder can typically be passed on through the interface between the application and the operating system to be handled by the libraries that are available only to the operating system. For implementation with many operating systems it is convenient for the transport libraries to use existing Ethernet/IP based control-plane structures: e.g. SNMP and ARP protocols via the OS interface.
There are a number of difficulties in implementing transport protocols at user level. Most implementations to date have been based on porting pre-existing kernel code bases to user level. Examples of these are Arsenic and Jet-stream. These have demonstrated the potential of user-level transports, but have not addressed a number of the problems required to achieve a complete, robust, high-performance commercially viable implementation.
One particular problem with user-level transport libraries is that in bypassing many of the routines normally performed in the kernel, they also lose the trust normally accorded those routines. This is because the kernel no longer has control of the user-level routines and cannot enforce their identity with those in the kernel. Users or application programs are able to modify the user-level transport routines, or replace them with others provided by a third party. As a result, the support of user-level transport libraries to bypass kernel routines and avoid context switches, increases the risk of malformed or even malicious traffic driven onto the network.
Part of the risk of permitting user-level transport libraries can be overcome by virtualizing the network interface device in such a way that each process is aware of only its own resources. The hardware can be virtualized in such a way that one process cannot transmit or receive data on behalf of another, nor can one process see the data belonging to another process. But this kind of virtualization does not prevent a process from transmitting problematic data packets out onto the network through its own assigned resources; hence trust is still not ensured.
In order to address issues like the latter, roughly described, a network interface device receiving data packets from a computing device for transmission onto a network, the data packets having a certain characteristic, transmits the packet only if the sending queue has authority to send packets having that characteristic. The data packet characteristics can include transport protocol number, source and destination port numbers, source and destination IP addresses, for example. Authorizations can be programmed into the NIC by a kernel routine upon establishment of the transmit queue, based on the privilege level of the process for which the queue is being established. In this way, a user process can use an untrusted user-level protocol stack to initiate data transmission onto the network, while the NIC protects the remainder of the system from certain kinds of compromise.