Access to computer networks has become a ubiquitous part of today's computer usage. Whether accessing a Local Area Network (LAN) in an enterprise environment to access shared network resources, or accessing the Internet via the LAN or other access point, it seems users are always logged on to at least one service that is accessed via a computer network. Moreover, the rapid expansion of cloud-based services has lead to even further usage of computer networks, and these services are forecast to become ever-more prevalent.
Expansion of network usage, particularly via cloud-based services, has been facilitated via substantial increases in network bandwidths and processor capabilities. For example, broadband network backbones typically support bandwidths of 10 Gigabits per second (Gbps) or more, while the standard for today's personal computers is a network interface designed to support a 1 Gbps Ethernet link. On the processor side, processors capabilities have been increased through both faster clock rates and use of more than one processor core. For instance, today's PCs may employ a dual-core processor or a quad-core processor, while servers may employ processors with even more cores. For some classes of servers, it is common to employ multiple processors to enhance performance. In addition, it is envisioned that much of the future processor performance increases will result from architectures employing greater numbers of cores, and that future servers may employ greater numbers of processors.
In computer systems, network access is typically facilitated through use of a Network Interface Controller (NIC), such as an Ethernet NIC. In recent years, server NICs have been designed to support for many optimizations for multi-core, multi-processor platform architectures. These optimizations include Receive Side Scaling (RSS) and Application Targeted Routing (ATR).
In recent years, virtualization of computer systems has seen rapid growth, particularly in server deployments and data centers. Under a conventional approach, a server runs a single instance of an operating system directly on physical hardware resources, such as the CPU, RAM, storage devices (e.g., hard disk), network controllers, I/O ports, etc. Under a virtualized approach, the physical hardware resources are employed to support corresponding virtual resources, such that multiple Virtual Machines (VMs) may run on the server's physical hardware resources, wherein each virtual machine includes its own CPU allocation, memory allocation, storage devices, network controllers, I/O ports etc. Multiple instances of the same or different operating systems then run on the multiple VMs. Moreover, through use of a virtual machine manager (VMM) or “hypervisor,” the virtual resources can be dynamically allocated while the server is running, enabling VM instances to be added, shut down, or repurposed without requiring the server to be shut down. This provides greater flexibility for server utilization, and better use of server processing resources, especially for multi-core processors and/or multi-processor servers.
Under a conventional approach employing server virtualization, physical or logical cores (such as those implemented in processors using Intel® Corporation's Hyper-threading™ architectures) are allocated to VM's at a similar ratio, such as 1:1. As packets are received at NIC receive (Rx) ports, some initial packet processing operations are performed to determine where in system memory the packets are to be written, which entails a DMA (direct memory access) write of the packet from a NIC input buffer to a buffer in system memory allocated to the VM that is the consumer of the packet or otherwise is to be implemented for performing packet forwarding operations. DMA operations are usually facilitated using high-speed interconnects such as Peripheral Component Interconnect Express (PCIe) links that are coupled between a NIC and the multi-core host processor. PCIe employs packet-based memory transactions (e.g., DMA writes to system memory) over a multi-lane serial link structure, enabling inbound traffic to be multiplexed effectively using applicable queuing techniques. Once in system memory, additional forwarding-related operations are performed by software-based entities using host processor resources, such as networking software that is part of an operating system running on the host processor or networking software running on a VM.
Currently, for more efficient packet processing, NICs segment their receive and transmit (Tx) dedicated memory to queues (also commonly referred to as buffers), usually equal in number to the number of physical or logical cores in the host processor. Through RSS and advanced filtering mechanisms such as Intel Corporation's Flow Director, network flows get assigned to Rx queues. Each core in the system processes packets from a specific Rx and Tx queue pair through use of interrupt affinity, whereby (ideally) maximum parallelization is achieved as network traffic is load balanced by the NIC with flows being spread to different queues and so each core gets a (relatively) fair share of the total received network traffic. Although this has been a good technique, it does not scale well for NICs operating at higher bandwidths, such as 10+ Gbps. In particular, the PCIe interconnect(s) becomes saturated and the processor caches are prone to thrashing. In modern data center servers where several virtual machines (VMs) run in the same host sharing the same NIC, packet processing becomes a bottleneck. In addition, VM-to-VM communication, even within the same system, occurs via network communication, which typically involves use of an external switch.