Improvement of an operation frequency in a CPU (Central Processing Unit) is approaching a limitation. Therefore, development of a CPU is directed to increase of cores from the improvement of the operation frequency. Under these circumstances, a multi-core CPU provided with a plurality of cores has been widespread.
A proxy apparatus is provided between a client and a server to relay a client side connection and a server side connection. Herein, the client side connection means a connection between the client and the proxy apparatus. The server side connection means a connection between the server and the proxy apparatus. Also, in this proxy apparatus, the apparatus provided with a multi-core CPU is increasing. Thus, architecture and implementation are proposed for more efficiently operating the multi-core CPU.
Also, parallelism is important to attain a high scalability by using the multi-core CPU. In the multi-core CPU, a synchronous process among the cores affects a processing ability. Therefore, in the proxy apparatus, it is necessary to perform a control so as to avoid synchronization among the CPU cores.
In case that the proxy apparatus treats two processes of the client side connection and the server side connection as two sessions, little data to be shared exists between the two sessions. Therefore, it is known that the most effective method for parallelization in such a communication is to separate the CPU cores executing processes in units of sessions. This method is referred to as “session distribution”, hereinafter.
Patent Literature 1 (WO 2009/073295A) discloses a technique that a dispatcher for determining the allocation of resources for a process selects an address of a proxy apparatus in the server side connection such that packets transmitted through the two connections of the client side connection and the server side connection are processed by the same CPU core.
According to Patent Literature 1, since the dispatcher allocates the packets transmitted through the two connections to the same CPU core, the session distribution is realized, and thus, the synchronization with another CPU core is unnecessary and the scalability can be improved.
On the other hand, there exists a proxy apparatus for terminating a TCP (Transmission Control Protocol) connection. Such a proxy apparatus is realized as an application of a user space utilizing a socket.
A thread executing a proxy process in the proxy apparatus (to be referred to as “proxy thread”, hereinafter) establishes the TCP connection from the client and acquires a socket related to the connection. Subsequently, the proxy thread acquires data from the socket and executes predetermined processes such as checking of data, processing, and determination of presence or absence of a cache. Then, the proxy thread generates a socket related to the server side connection and establishes the connection with the server and sends the data to the socket.
In order to realize the session distribution in such a proxy apparatus, a system generates the same number of proxy threads as the number of CPU cores. The system fixedly allocates the respective proxy threads onto the CPU cores. Thus, two socket processes related to a certain session in the client side connection and the server side connection are executed by the proxy thread operating on the same CPU core.
In this processing, information of the socket of the client side connection is added to a queue of a listen socket in a state that the connection can be established. The proxy thread executes a calling of an accept function in order to establish the connection. Thus, the socket corresponding to the connection to be established is taken out of the queue so as to establish the connection. Thereafter, the process of the connection once established is executed by the proxy thread that establishes the connection.
Therefore, a CPU core for processing a certain client side connection is determined at the time of establishing the connection. Further, since the server side connection is established after the proxy thread establishes the client side connection, two connections of the client side connection and the server side connection related to a certain session are processed by the proxy threads fixedly allocated onto the same CPU core.
Thus, the parallelization of the proxy process in a user space can be realized and a high scalability can be obtained taking advantage of a multi-core CPU.
Recently, in order to improve performance of a Web server mounted with a multi-core CPU, a NIC (Network Interface Card) appears which has a function such as Receive Side Scaling or Receive Packet Steering (these functions collectively being referred to as “RSS”, hereinafter) to dynamically distribute a received packet to an available CPU core free from processing.
This RSS installed NIC has a function of calculating a hash value based on information contained in a header of the received packet and determining a CPU core to be interrupted based on the hash value.
Since packets transmitted on a certain connection include the same header information, all of the packets are assigned to the same CPU core. That is, a kernel thread executes a protocol process of each received packet by the RSS installed NIC in response to the receipt of the packet and a process up to registration of packet data into a buffer of a corresponding socket can be executed by the same CPU core.
In this manner, by using the RSS installed NIC, the processes by the kernel threads can be executed in parallel every connection and a high scalability can be obtained taking advantage of a multi-core CPU.
However, when a proxy apparatus is realized to execute a process from the receiving of packets by the RSS installed NIC to the proxy process in the user space by the same CPU core by using the method described in Patent Literature 1, the following problems exist.
A first problem is in that a CPU core for a kernel thread to operate to process a certain connection is not always the same as a CPU core for a proxy thread to operate thereon. This is because there is a possibility that the proxy thread operating on the CPU core different from the CPU core for the kernel thread to operate thereon calls an accept function in advance to establish a connection.
Once the proxy thread fixedly allocated to a different CPU core establishes a connection, the processes related to the connection are executed by the same proxy thread thereafter. Since the CPU cores for the proxy thread and the kernel thread to operate thereon are different from each other, a high speed process taking advantage of CPU cache cannot be realized.
Therefore, in order that the kernel thread and the proxy thread are executed on the same CPU core, a mechanism is needed in which a proxy thread establishes the client side connection, and the proxy thread operating on the same CPU core as the kernel thread executing a process of establishing the client side connection establishes the server side connection.
A second problem is in that since data contained in the headers of the packets belonging to two connections of the client side connection and the server side connection are different, the packets transmitted on the two connections are not always processed by the kernel thread operating on the same CPU core even though the RSS installed NIC is used.
As the RSS installed NIC provided in the proxy apparatus, a product by a third vendor is usually used. In the third vendor product, algorithms determining a distribution destination CPU core to be interrupted by the RSS installed NIC are unknown in many cases. Therefore, it is difficult to select an address of a CPU core for executing a process of the server side connection in a manner that the RSS installed NIC can allocate a packet on the client side connection and a packet on the server side connection to the same CPU core, like Patent Literature 1.
If the server side connection is allocated to the kernel thread operating on a different CPU core by the RSS installed NIC, it is necessary that the proxy thread is synchronized with the kernel thread operating on the different CPU core in a process of receiving packets on the two connections. In general, since a processing cost becomes larger in the synchronizing process between different CPU cores than in the synchronizing process in the same CPU core, the parallelism in processing requires a large cost. Also, since the CPU cores are not identical, locality of cache is lowered and the performance is degraded.
As a related technique, techniques related to improvement of a processing speed in an apparatus using a multi-core CPU are disclosed in Patent Literatures 2, 3 and 4.