The present invention relates to a packet transfer apparatus and method for performing routing of a variable-length packet on the basis of TCP/IP (Transmission Control Protocol/Internet Protocol).
In packet communication based on TCP/IP, a host for transmitting data generally fragments data into small units called packets. Then, the host adds header information such as a transmission source address or destination address to each packet, and sends the resultant packet to a network. At this time, the maximum packet length transmittable from each host to a network is determined by MTU (Maximum Transmission Unit) supported by the protocol of the data link layer (second layer of an OSI reference model) of a network connected to a host for exchanging data.
For example, the MTU value used in major data link layer protocols is 1,500 bytes for Ethernet whose transmission rate is 10/100 Mbps, and 4,352 bytes for FDDI (Fiber Distributed Data Interface). When AAL5 is used in an ATM (Asynchronous Transfer Mode) network or a jumbo frame is supported by Ethernet whose transmission rate is 1 Gbps, the MTU value is about 9,000 bytes.
If the protocol of the transport layer (fourth layer of the OSI reference model) is TCP, the maximum data length which can be contained in each packet is called MSS (Maximum Segment Size). According to the IETF (Internet Engineering Task Force) standard RFC 879 “The TCP Maximum Segment Size and Related Topics”, the MSS value is determined by subtracting a default IP header length and TCP header length from the above-mentioned MTU.
For example, for IP version 4, the default IP header length has 20 bytes, and the default TCP header length has 20 bytes. For a 1,500-byte MTU supported by the data link layer protocol, MSS is 1,460 (=1,500−20−20). When transmitting and receiving hosts are connected by the same data link, the most efficient data transmission method is to fragment transmission data and transmit packets.
To the contrary, if transmitting and receiving hosts exist on different networks, MSSs corresponding to the MTUs of data links directly connected to the respective hosts are sent to each other by using additional information called an MSS option in establishing a TCP connection. A smaller MSS is adopted between the two hosts. A path which can only support an MTU smaller than that of a data link directly connected to the transmitting/receiving hosts may exist in the transfer path. In this case, a relay node notifies the transmitting hosts to fragment data into shorter packets during data transfer.
This method is called path MTU discovery. “Path MTU” means the smallest MTU in the packet transfer path. Path MTU discovery is described in detail in W. Richard Stevens, “TCP/IP Illustrated, Volume 1: The Protocols”, pp. 382–387, March 1997. The transmitting/receiving hosts performs data communication using MTU and MSS as large as possible because of the following reason. In general, as the fragmentation unit increases in fragmenting data into packets and transmitting the packets, the network transfer efficiency also increases. Higher network transfer efficiency decreases the CPU utilization factor, i.e., processing load of the transmitting/receiving hosts.
However, data transmission/reception via a wide area network suffers a transfer path which cannot always use a large MTU due to restrictions by data link specifications or problems caused by the physical channel quality. For example, when various pieces of information are to be downloaded by a general subscriber from a server connected to a packet communication network by using a public access network, the data link (access link) between the subscriber house and a packet transfer apparatus nearest to the subscriber house often uses media such as a metallic line, coaxial cable, and radio channel. The physical qualities of these media are not so high, which makes it difficult to adopt a large MTU.
To the contrary, the data link between the server and the packet transfer apparatus nearest to the subscriber house often uses large-capacity, high-quality media such as an optical fiber. This enables supporting a relatively large MTU. The MSS of the transport layer is determined by a data link having the smallest MTU out of MTUs on the transfer path. Even if all other data links support large MTUs, the subscriber must communicate with the server by using a small MTU corresponding to the physical quality of the access link.
A conventional packet transfer apparatus in a network may execute packet fragmentation in the data link layer or network layer (third layer of the OSI reference model), but does not perform packet fragmentation in the transport layer. On the contrary, packet fragmentation in the transport layer may be done by a gateway which terminates communication between receiving and transmitting hosts once in the application layer (seventh layer of the OSI reference model), like a World Wide Web proxy server. In this case, however, connection is completely terminated in the transport layer. The processing load greatly increases, and this method is not employed for a high-traffic relay.
A conventional method of performing communication between transmitting and receiving hosts using an MSS larger than an MSS determined by the smallest MTU among MTUs supported by respective transfer paths has been proposed in Hasegawa et al., “A Study on TCP Throughput Acceleration Mechanism using On-board CPU”, Multimedia, Distributed, Cooperative and Mobile Symposium, pp. 505–510, June 2000. This reference discloses the following method. Dedicated boards are mounted on transmitting and receiving hosts, and segmentation and reassembly of TCP segments are achieved on the dedicated boards. A plurality of packets actually supplied through a network are transmitted at once to the CPUs of the hosts.
This method decreases the frequency of packet processing by the CPU of the transmitting/receiving hosts, and the load on the CPU can be decreased. However, the packet length actually transferred through a network is determined by the smallest MTU among MTUs supported by transfer paths, like the prior art, and the network transfer efficiency cannot be increased.
General data communication using TCP adopts “slow start” which gradually increases the transfer rate every time the transmitting hosts receives an acknowledge from the receiving hosts after the start of communication. In current packaging of slow start in almost every host, the default value of a congestion window after the start of communication is set to MSS or double the MSS, as described in Section 3 of the IETF standard RFC 2581 “TCP Congestion Control”. Every time the transmitting hosts receives an acknowledge from the receiving hosts, the congestion window is increased by MSS.
This method is efficient when the network transfer delay is relatively small between transmitting and receiving hosts. If, however, the network transfer delay is very large, like packet communication via a radio link, even transfer of a small amount of data takes a long transfer time.
As a conventional measure against this problem, there is proposed an initial window increasing method of setting the default value of the congestion window of a transmitting hosts larger than double the MSS. This is also described in the IETF standard RFC 2414 “Increasing TCP's Initial Window”. However, the network transfer delay varies between transmitting and receiving hosts. If the default value of the congestion window of the transmitting hosts is uniquely set large in accordance with that of the receiving hosts suffering a large network transfer delay, relay network congestion occurs.