In Japanese Patent Application Publication (JP-P2005-535226A: related art 1), a technique for a state-full protocol process at a high data rate is described. In this technique, a message is processed in a state-full, protocol such as TCP, and a high-speed data processing is executed by distributing the message belonging to a single flow to a plurality of protocol processing cores at different times.
Also, “Δn Analysis of TCP Processing Overhead” (IEEE Communications, June 1989), by D. Clark, V. Jacobson, J. Romkey, and H. Salwen, and “TCP Performance Re-visited” (IEEE International Symposium on Performance Analysis of Software and Systems, March 2003) by A. Foong et al. (related arts 2 and 3) are known. In these papers, it is reported that one of the bottle neck factors in a process of transferring a TCP data lies in a buffer copying process, and not in a portion of the state-full protocol process of the TCP.
In the related art 2, a cost of a process amount per byte transfer of a TCP process is described (p. 27, Table 1) A memory copy from a user space to a system space takes 200 microseconds, a TOP checksum process takes 185 microseconds, and a network memory copy takes 386 microseconds. Since the TCP checksum process and the network memory copy among them is typically executed as an off-load-process in hardware by use of a network interface card, they can be removed from the bottle neck factors. However, the memory copy between the user space and the system space still remains as the severe bottle neck.
On the other hand, in the related art 3, a profiler is used to measure a Load of a Linux kernel process. In a transmitting process request of 64 KByte from a socket, a process rate of the buffer copy and the checksum is 34%, and in a receiving process request of 64 KByte from the socket, the process rate of the buffer copy and the checksum is 41%. That is, the rate of a buffer copying process is higher than the rate of a protocol process.
The following analysis was performed by an inventor of this application. The technique of the related art 1 was specially focused to high-speed packet processing of the state-full protocol process. That is, the technique described in the related art 1 does not assume an operation of re-configuring packet data after a protocol process into an application stream and copying the re-configured data into an application buffer.
According to the related arts 2 and 3, even if the protocol process itself is made higher by using the technique described in the related art 1, the performance of the entire system cannot be improved unless a buffer copying process into an application buffer is made higher. Accordingly, it is necessary to increase the processing speed of the buffer copying process of the packet data after the protocol process into the application buffer. At this time, the following problems may be indicated.
First, a main bottle neck factor in the TCP receiving process is not the TCP protocol process but the buffer copying process to the application stream of the reception packet. Therefore, even if the processing speed of the TCP protocol process is made higher by using multi-core configuration, the performance is not improved in the system level.
Second, in the conventional TCP process, out-of-order data whose reception order is different from a transmission order from a counter host are held in a reception packet buffer until a reconfigurable state is set. After that, when a buffer copy is performed at a timing, the buffer copying process changes to an overload state, so that the system performance falls. This is because of the following reason. That is, when in-order data is received so that an original data can be reconfigured by filling between the out-of-order data, namely, in a time slot of packet reception, the buffer copying process of the in-order data is executed, in addition to a copying process of the out-of-order data.
Third, even if the buffer copying process is made higher by distributing the process into a plurality of blocks, it is difficult to correctly determine whether or not an application stream has been prepared. For example, even if whether or not the application stream has been prepared is determined by summing a value indicative of the completion of the buffer copy and the correct determination cannot be performed when overlapping of the out-of-order data is generated.
Fourth, in case that the buffer copying process is distributed into the plurality of blocks, it is impossible to determine that the application stream has been prepared, even if the buffer copying process to the last portion of the application stream is completed. Since there is a possibility that a specific TCP buffer copying section changes to an overload state so that the processing time becomes longs buffer copy completion notices are not always received in an order of generation of requests of the buffer copy.