In recent years, there have been an increasing number of cases where a server equipped with a plurality of central processing units (CPUs) adopts a non-uniform memory access (NUMA) architecture that facilitates a scale-up.
FIG. 14 is a view illustrating a server of a NUMA architecture. As illustrated in FIG. 14, the server of the NUMA architecture has a plurality of NUMA nodes. For the convenience of explanation, only two NUMA nodes represented by a NUMA node #1 and a NUMA node #2 are illustrated In FIG. 14, but the server of the NUMA architecture may have three or more NUMA nodes.
The server of the NUMA architecture has a memory independently for each CPU. That is, the NUMA node #1 has a CPU #1 and a memory #1, and the NUMA node #2 has a CPU #2 and a memory #2. Each CPU includes a memory controller, an inter processor link (IPL) controller, and an I/O controller.
The memory controller is connected to a memory and controls an access to the memory. The IPL controller controls the communication between the processors. The I/O controller controls an I/O device to be connected. A network interface card (NIC) is connected to the I/O controller of the NUMA node #1.
In the server of the NUMA architecture, the basic software (operating system (OS)) allocates an application with a memory of the same NUMA node as a CPU on which the application is operating, so that the memory access conflict among CPUs may be reduced. Therefore, a scale up of the server of the NUMA architecture is easier than a server of a uniform memory access (UMA) architecture in which plural CPUs are connected to a memory via a common memory controller.
However, in the server of the NUMA architecture, a performance degradation occurs in a virtual environment using a virtual switch. FIG. 15 is a view for explaining a performance degradation in a virtual environment using a virtual switch. In FIG. 15, a VM #1 is a virtual machine (VM) operating on the NUMA node #1, and a VM #2 is a virtual machine operating on the NUMA node #2. The VM #1 and the VM #2 receive packets from a network via the virtual switch. The VM #1 and the VM #2 receive packets using a virtual NIC (vNIC) reception buffer. The vNIC reception buffer of the VM #1 is installed in the memory #1, and the vNIC reception buffer of the VM #2 is installed in the memory #2.
The function of the virtual switch is implemented when plural threads are executed by a CPU. The plural threads are executed by the CPU #1 or the CPU #2. The virtual switch has two v ports (virtual ports) for VM and a logical port associated with a physical port of NIC.
When a physical port represented by a pNIC #1 of the NIC receives a packet from the network (1), the received packet is written in a reception buffer of the pNIC #1 with direct memory access (DMA) (2). The reception buffer of the pNIC #1 is generally installed in the memory #1 of the NUMA node #1 to which the NIC is connected. Then, a thread of the virtual switch reads the packet from the reception buffer of the pNIC #1 (3), and when the packet is addressed to the VM #2, writes the packet in the vNIC reception buffer of the VM #2 (4).
Here, since the thread of the virtual switch operates on the CPU #1, the write in the vNIC reception buffer of the VM #2 becomes the write in a remote memory. Therefore, in comparison to the vNIC reception buffer of the VM #1, the performance is substantially reduced in the write in the vNIC reception buffer of the VM #2.
Thus, there has been proposed a technique in which a physical reception queue for each NUMA node is allocated to a physical port of the NIC, a virtual port for each NUMA node is associated with the physical port of the NIC, a reception queue is allocated for each virtual port, and a DMA destination of the physical reception queue is assumed as a reception buffer of the corresponding NUMA node. According to this technique, it is possible to prevent a degradation of communication performance between the NIC and VM existing in different NUMA nodes.
In addition, there has been proposed a technique in which a load balancer allocates a VM using a specific NUMA node to a network queue of the same NUMA node, and a scheduler allocates the VM to the same NUMA node as the NIC or the network queue.
Further, there has been proposed a technique in which a network adapter routes a received packet having a packet flow identifier to a transmission/reception queue associated with the packet flow identifier among plural transmission/reception queues storing transfer packets.
Related technologies are disclosed in, for example, U.S. Pat. Nos. 9,495,192 and 9,069,722.