1. Field of the Invention
The present invention relates to a packet processing device, and in particular to a packet processing device which has a layered communication protocol.
In recent years, a packet processing device which has a hierarchically structured communication protocol such as a TCP/UDP/IP has been popularized as used in a router, a layer 3 (third layer) switch (hereinafter abbreviated as “L3 switch”) which performs a packet processing of a network layer on a hardware, a network interface device (Network Interface Card: hereinafter abbreviated as “NIC device”) of host systems, and the like.
Also, as the communication technology in recent years has remarkably progressed, the transmission speed has become increased more and more. As a result, the packet processing device is required to follow up the speed.
2. Description of the Related Art
Having a main function of relaying packets in a network layer, the router and the L3 switch terminate the packets and pass them to an upper layer for supporting their own communications. Also, an NIC device mainly initiates and terminates the packets to realize upper layer communications. NIC devices are also able to relay the packets between plural NIC devices. The packet processing device has a function of passing packets between the layers.
Hereinafter, equipments which include such a packet processing device will be described in more detail.
The router is an equipment which is used as a packet forwarding device between networks, or subnetworks, in the Internet/Intranet. Between subnets, a packet forwarding at the third layer level in the OSI Seven Layer protocol model is generally performed. If a TCP/IP is used as the end-to-end communication protocol, the router relays an IP packet in the third layer.
Also, the router itself becomes a communication host by initiating and terminating the packets to support a remote control.
The L3 switch, as the router, is an equipment which is used as a packet forwarding device between networks, or subnetworks, in the Internet/Intranet and performs a packet forwarding at the third layer level in the OSI Seven Layer protocol model. However, unlike the router, the L3 switch realizes an IP relay processing using hardware logic.
In other words, it can be said that the L3 switch realizes packet forwarding function using hardware, which a router realizes using software.
Also, like a router, the L3 switch itself becomes a communication host by initiating and terminating the packets to support a remote control.
The NIC device is mainly used as an interface of a terminal host which is an initiation/termination point of a communication, and plays a part of a protocol processing for the application to transmit and receive the packets working with a communication application software on a communication terminal. Also, by being equiped with plural NIC devices, it is possible for a terminal host to perform a packet forwarding using a communication software.
In such devices, various ideas to efficiently pass packets received from a hardware to an upper layer operating system (hereinafter occasionally abbreviated as “OS”) have been proposed in order to speed up the protocol processing.
Hereinafter, conventional protocol processing methods when a paket processing device receives and terminates IP packets will be described:
(1) Zero-copy TCP Method
Generally, a memory space where an OS kernel operates and a memory space where each user program operates are strictly separated from each other in an application executing OS. Therefore, in order that the kernel receives packets from network and passes the data arrived at a TCP socket buffer to a communication application which operates on the OS, it is necessary to perform a memory copy from a kernel memory space to a user memory space according to a regular procedure.
The zero-copy TCP method offers a mechanism which makes a ,part of the OS kernel memory space accessible from the application so that the application can access the receiving socket buffer of the TCP.
As a result, the memory copy which takes a long processing time in the protocol processing within the OS becomes unnecessary and the OS protocol processing is sped up.
(2) I2O method
This method is based on an interface specification in which the protocol processing or the like which has been performed by the OS on a main processor is to be performed by a sub processor on a network interface.
Namely, an OS different from the OS on the main processor is operating in a sub processor on a NIC, so that the communication protocol processing is performed within this OS and then it transfers data after the processing to the main processor OS.
As a result, a load of the main processor is reduced and the protocol processing is sped up.
(3) Communication control device of large-scale host computer
A communication method between host computers using communication processing devices connected to the host computers by channels is disclosed in the Japanese Patent Laid-open Publication No.4-292039.
Generally in the channel between the host computer and the communication processing device, the communication is performed with big size data packets generated in a transport layer of the host computer.
In this method, the big size packets transmitted by a transmitting host computer are divided according to MTU of a transmission line within a subnet section between the communication processing devices. A receiving communication processing device reassembles the original big size packet to be passed from the communication processing device to a receiving host computer.
As a result, the efficiency of channel utilization between the host computer and the communication processing device can be improved.
(4) L3 Switch Method
The L3 switch used in this method, as mentioned above, has a packet forwarding function in the network layer using hardware logic. Also, the L3 switch is thought to be same as the existing router except it has a hardware packet forwarding function in the network layer, so that it can be substituted for a router.
Moreover, some of the routers have highly developed communication control functions such as a fire wall, a Proxy server, and the like and determine whether or not each communication session may be passed. With the appearance of the L3 switch, an architecture which has such highly developed communication control functions on the L3 switch has appeared.
FIG. 19 shows the above-mentioned architecture wherein an OS kernel 20 operates on an L3 switch 81 and an application 11 runs on this OS kernel 20.
The L3 switch 81 transfers a receiving packet 90 as a transmitting packet 95 after switched through a receiving interface 31, a routing process circuit 83, an SW portion 84, an output buffer 85, and a transmitting interface 46 (see route (1) in FIG. 19).
Also, a part of the receiving packet 90 is sent from the receiving interface 31 to a protocol processor 86 of the OS kernel 20 where filtering or the like is performed and then returned to the output buffer 85 (see route (2) in FIG. 19). Moreover, a part of the receiving packet 90 is sent to the application 11 such as the fire wall or the like for judging if the traffic is permitted to be forwarded (see route (3) in FIG. 19).
In such a fire wall equipment, after first several packets for the identification of communication session or packets for the authentication are identified by the software, filtering conditions of communication sessions are recorded in a filtering table of a hardware so as to be identifiable by the hardware logic and the following session identifications are performed by the hardware at a high speed.
In this case, the speed-up is achieved by processing the filtering on the hardware of the L3 switch 81. However, several packets for the session identification, communication sessions for the management of the L3 switch, or complicated authentication functions are processed by a software in the upper layer.
FIG. 20 shows the L3 switch 81 shown in FIG. 19 in more detail wherein the receiving interface (RX) 31 for inputting the receiving packet, a routing processor 75, a header rewriting portion 76, an input SW portion 77, a shared memory 78, an output SW portion 79, and the transmitting interface (TX) 46 for sending out the transmitting packet 95 are connected in series.
Also, the shared memory 78, a CAM 74, a CPU 71, and a main memory 72 are mutually connected by a bus 73. The shared memory 78 is further connected to a buffer control circuit 37 and the CAM 74 is connected to the routing processor 75.
In operation, the receiving packet (for example, Ethernet frame) 90 received from a network is accumulated in an input buffer (FIFO) of the receiving interface 31.
The L3 switch is able to switch an L2 (Layer 2) frame and an L3 (Layer 3) IP packet by a hardware.
The L2 switch function learns a source address of an Ethernet frame inputted to each port to prepare a learning table which makes an Ethernet address correspond with an output interface.
The L2 switch function retrieves the learning table with a destination MAC address as a key after having performed the CRC check to the received Ethernet frame. In case the learning table indicates the Ethernet address of its own receiving interface, the L2 switch function performs the receiving process as the frame is addressed to this L3 switch.
First of all, a hardware routing table (not shown) is retrieved with a destination IP address of the IP packet as a key. The routing table is written in the memory 74, which sometimes consits of a CAM (Content Addressable Memory), in the form of a corresponding table of IP addresses and ports or Ethernet addresses based on routing information managed by the OS. By retrieving the table by IP address as a key, an output port or an Ethernet address may be outputted.
When the routing table is retrieved, resulting in that the packet is not addressed to that device itself but required to be forwarded to another router or host, the packet is sent to the input SW portion 77 with the destination and the source address of the Ethernet frame header respectively assigned with an Ethernet address of the next hop obtained from the routing table look-up and an Ethernet address of the own output port.
The input SW portion 77 refers to the learning table by the destination Ethernet address of the frame header to determine a port to be outputted. Afterwards, the frame itself is stored in the shared memory 78, and an entry is added to a queue of the buffer control circuit 37 which controls the output order of each output interface. As described information of each queue entry, a start address of the shared memory 78 where the frame is stored is written.
The output SW portion 79 takes out a start address of a frame which should be transmitted next for each interface from the queue of the buffer control circuit 37 and takes out the frame from the shared memory 78 referring to the start address, then transfers to the transmitting interface 46.
The transmitting interface 46 transmits that data frame by a CSMA/CD access method to send frames to the transmission line.
Also, if the receiving packet is an IP packet addressed to itself, a queue addressed to CPU 71 is prepared in the buffer control circuit 37 to switch that packet to itself. When the frame arrives at the queue in addressed to itself, an interrupt is raised to the OS operating on the CPU 71 so that the frame is received by having the OS access the shared memory 78.
In order to maintain secrecy and to avoid attacks from outside, a private network such as an enterprise network has a mechanism which enables the access to the Internet from the inside of the private network but disables the access to the private network from the outside thereof by using a device called a “fire wall” to be connected to the Internet.
FIG. 21A shows the simplest example for connecting a private subnet 63 to the Internet 65 through a fire wall.
A host 61 which is connected to the private subnet 63 is communicating through a Proxy server router (L3 switch) 64 with a host 62 which is connected to the external Internet 65.
The packet is first filtered at the router, and then judged if the packet is permitted to be forwarded by the function of the fire wall.
While a dedicated device is generally used for the router 67 as shown in FIG. 21B, the fire wall, which is a software product in many cases, is operated on a general-purpose computer such as a workstation.
Namely, the network systems have been constructed by using distinct router 67 and fire wall 66.
However, the L3 switch (router) which has a forwarding function of IP packets using a hardware logic has rapidly improved the router's forwarding rate by speeding up the packet forwarding process in the network layer.
Since this L3 switch enables a packet filtering by the hardware, devices have appeared which perform a part of the filtering process of a fire wall product by hardware.
Such a product enables the hardware filtering by replacing the filtering condition or the like which is set for the fire wall software to perform within the OS with the filtering condition for the hardware to be set to the L3 switch hardware.
(5) Improved NIC Device With Reduced OS Interrupts
As for the NIC device equipped on a computer system, there are products which aim to avoid an overhead of the OS by reducing the number of interrupts to the OS when the packets are received by the hardware from network.
Generally, when the NIC device receives a packet, it raises an external interrupt to the OS. The interrupted OS copies the packet from the receiving buffer on the NIC device to a memory space which is controlled by the OS. In this case, since the OS is interrupted every time the NIC device receives a packet for the packet processing, it has a big overhead.
Therefore, the improved NIC device method uses a technology in which the OS is interrupted after receiving a plurality of packets and then the OS copies each packet to the memory space of the OS to reduce the number of OS interrupts, thereby reducing the processing overhead within the OS.
Hereinafter, problems of such devices which equip the conventional packet processing technic will be described.
(1) Although the Zero-copy TCP method eliminates the overhead of copying data from the OS kernel address space to the user address space, the operation of the OS having the hardware receive the packets is the same as the conventional software processing, so that the packet forwarding by the hardware is disturbed as the number of packets passed to the software processing increases.
(2) As for the packet reception seen from the upper layer OS by the I2O method, the number of interrupts to the OS is decreased and the size of the packet to be received is enlarged up to a data unit in the application layer. However, since a lower layer OS within the network interface performs an ordinary packet reception, from the viewpoint of packet forwarding by the hardware, the hardware processing is blocked when the number of the received packets increases.
(3) Communication control device of large-scale host computer
Nowaday communication hosts have thereon network protocols including a data link layer for various LAN media but do not divide a packet at a transport layer which is originally big into smaller ones for the transmission, and do not reassemble into a big packet at a transport layer of a receiving host as disclosed in the Japanese Patent Laid-open Publication No.4-292039.
(4) As for the L3 switch method, the OS accesses the receiving buffer memory in the hardware after switching contexts by the interrupt raised per each packet reception. While this L3 switch performs the network layer packet forwarding processed by a high-speed hardware, the memory access from the OS causes the packet forwarding of the hardware to be blocked and degrades the routing performance.
(5) As for the improved NIC device method with reduced OS interrupts, while the interrupt processing of the software decreases, the overhead to access hardware memory by the OS do not decrease because the OS performs the processing of the received packets for each packet, not for the number of interrupts.
Also, the fire wall device mounting the L3 switch has problems at present as follows:
(1) Since this is devised by assuming the filtering and the policy setting which may be achieved by the hardware of the L3 switch, a complex session identification or the like requires a software processing.
(2) Since the hardware processing for the traffic via an application cannot be performed by the L3 switch, an application gateways which looks into the contents of a packet cannot achieve a satisfactory performance.