In general, in a computer network, relay devices, such as switches or routers, are arranged between a server and a client to perform a relay process on packets. Conventional relay devices only perform a process of layer 2 (data link layer) and layer 3 (network layer) of the open systems interconnection (OSI) reference model. However, in recent years, relay devices, in some cases, have performed a higher layer process. Specifically, relay devices that perform a load distribution process for distributing loads on servers; a firewall process for preventing attacks from outside; or a higher-layer process such as a VPN process including a secure socket layer-virtual private network (SSL-VPN) and security architecture for Internet Protocol (IPsec) that is used for hiding communication between a client and a server have been introduced. Furthermore, because relay devices can performs analyses of the higher layer, in some cases, a quality of service (QoS) process using information on the higher layer is performed.
Furthermore, devices that are generally called network servers and that perform a process between the higher-layer and layer 2 and layer 3 have been introduced and are arranged in a computer network. Due to their versatility, loads in a network are sometimes concentrated; therefore, basic functions having high performance are desired for network servers. Because the relay process performed by the network server is not such a complicated process, it is possible to speed up the process using hardware implementation. In contrast, for a higher-layer process performed by the network server, it is difficult to speed up the process with a simple hardware implementation because it requires a complicated process and flexible expandability with respect to new services. Accordingly, to speed up a higher-layer process in a network server, it is necessary to speed up the software process, in other words, to improve the process performance of central processing units (CPUs).
In recent years, because the process performance of a single CPU has almost reached full capacity, by installing a plurality of CPUs or CPU cores (hereinafter, referred to as a “CPU”) in a single device, an attempt to speed up a software process has been performed. In such a case, because it is not possible to speed up the software process simply by a plurality of CPUs performing the same process, upon receiving a plurality of packets to be processed, the network server assigns the packets to the plurality of CPUs and the CPUs then each perform parallel processing.
FIG. 1 is a schematic diagram illustrating a basic architecture of the above-described parallel processing. As illustrated in FIG. 1, the packets to be processed are assigned, by an assignment processing unit 10, to CPUs 20-1 to 20-n (n is an integer equal to or greater than two), which are arranged in parallel, and are processed by the assigned CPUs. There are various kinds of methods for assigning packets performed by the assignment processing unit 10. For example, Japanese Laid-open Patent Publication No. 2005-64882 discloses a technology for determining the assigned CPUs using hash values and information on layer 3 or lower. When such parallel processing is performed, it is important to consider dependency on the plurality of CPUs. Specifically, for information shared by the plurality of CPUs, the CPUs possibly refer to and update the same information. However, if the CPUs simultaneously refer to and update the information, malfunction possibly occurs. Accordingly, when a single CPU accesses such shared information, it is necessary to perform an exclusive process in which access from CPUs other than the CPU that is accessing the information is prohibited.
In a complicated process such as the higher-layer process, the frequency of accessing the shared information, which a plurality of CPUs shares, is high, thus the frequency of the exclusive process occurring accordingly becomes high. As the frequency of the exclusive process occurring becomes high, the degree of improving parallel processing performance is reduced. Specifically, if the number of CPUs is doubled, theoretically, process performance is assumed to be doubled. However, in practice, the process performance never becomes double because an exclusive process between the CPUs occurs. In an extreme case, the process performance may drop compared with a case in which the number of CPUs is not doubled. Accordingly, to improve process performance, it is extremely important to reduce the frequency of the exclusive process.
When the higher-layer process is performed, it is conceivable to use a method in which packets are assigned to different CPUs for each connection, such as transmission control protocol (TCP) or user datagram protocol (UDP), and packets that are transmitted by the same connection are processed by the corresponding CPUs. Accordingly, a single CPU accesses connection information for each connection; therefore, an exclusive process due to accessing connection information, i.e., the basic information on the higher-layer process, becomes unnecessary.
However, if a QoS process is performed, even when packets are assigned for each connection, there might be a case in which different CPUs simultaneously access a queue for the same QoS process, which causes a problem in that an exclusive process cannot be eliminated. Specifically, in the QoS process, in general, queues are mapped in accordance with physical ports and are assigned in accordance with the setting of policy. Accordingly, packets transmitted by different connections are usually mapped onto the same queue.
More specifically, as in the example illustrated in FIG. 2, there is a possibility that both a packet that is processed by a CPU 20-1 corresponding to a connection TCP #1 and a packet that is processed by a CPU 20-2 corresponding to a connection TCP #2 may be mapped onto the same queue (in FIG. 2, the top queue) in a QoS processing queue group 30. To prevent the simultaneous occurrence of such a mapping, an exclusive process between the CPU 20-1 and the CPU 20-2 is needed.
With this configuration, for a queue process using a plurality of queues that a plurality of processing units share, because a dedicated queue processing unit performs the queue process, no access conflict occurs with respect to the plurality of queues received from the plurality of processing units that perform parallel processing. Therefore, an exclusive process between the processing units becomes unnecessary. In other words, when the CPUs in the plurality of CPUs perform parallel processing on packets, it is possible to improve process performance by reducing the frequency of an exclusive process occurring between the CPUs.
With this configuration, each of the processing units corresponds to a single connection and packets are assigned to processing units corresponding to connections that are used to transmit the packets. Accordingly, when a processing unit performs a process on a packet, no access conflict occurs with respect to information on each connection; therefore, it is possible to reliably reduce the frequency of an exclusive process occurring between the processing units.
With this configuration, when the packet is input, the packet is assigned to a processing unit corresponding to the receiving connection, whereas, when the packet is output, the packet is assigned to a processing unit corresponding to the sending connection. Accordingly, even when a receiving connection differs from a sending connection for a single packet, it is possible to assign the packet to processing units corresponding to the connections. As a result, for example, even when a receiving connection is terminated in the packet processing apparatus, a packet can be processed by the processing unit corresponding to the sending connection that is different from the receiving connection. Therefore, no access conflict occurs, in the processing units, with respect to information on the same connection. Accordingly, it is possible to reliably reduce the frequency of the exclusive process occurring between the processing units.
With this configuration, because the processing units identify the quality of packets, whereas, the queue processing units perform a queue process in accordance with the quality of packets, identification of the quality of packets, which can be simultaneously processed, is performed in parallel, and a queue process that possibly causes access conflict with respect to queues is performed in order. Accordingly, it is possible to speed up the queue process and also to reduce the frequency of the exclusive process occurring.
With this configuration, because the processors corresponding to the queue groups each perform a queue process, the queue process can be performed in parallel by the processors, thus further speeding up the queue process.
With this configuration, because a relay process for setting the destination of the packets is performed, in relay devices such as routers or switches, it is possible to improve process performance by reducing the frequency of the exclusive process between the CPUs.
With this configuration, because a higher-layer process is performed that belongs to a higher layer in which parallel processing is difficult, in multifunctional apparatuses such as network servers, it is possible to improve process performance by reducing the frequency of an exclusive process between the CPUs.
With these configurations, because dedicated processors perform the queue process on queues shared by a plurality of processors, no access conflict occurs with respect to queues from processors that perform parallel processing; therefore, exclusive process between the processors becomes unnecessary. In other words, when CPUs perform, in parallel, a process on packets, it is possible to improve process performance by reducing the frequency of an exclusive process between CPUs.