1. Field of the Invention
The present invention relates generally to data transfers in data processing network systems, and in particular to transfer of data blocks over the Internet and similar networks. Still more particularly, the present invention relates to improved Internet Protocol (IP) network communications.
2. Description of the Related Art
A computer network is a geographically distributed collection of interconnected communication media for transporting data between entities. An entity may consist of any device, such as a host or end station, that sources (i.e., transmits) and/or receives network messages over the communication media. Many types of computer networks are available, with the types ranging from local area networks (LANs) to wide area networks (WANs). The end stations, which may include personal computers or workstations, typically communicate by exchanging discrete messages, such as frames or packets, of data according to predefined protocols. In this context, a communications protocol stack consists of a set of rules defining how the stations interact with each.
The Internet has become an important computer network for transmission and distribution of data (text, code, image, video, audio, or mixed) and software. The primary protocols of the Internet communications architecture protocol stack are the Internet Protocol (IP) at the network layer (layer 3) and Transmission Control Protocol (TCP) at the transport layer (layer 4). The term TCP/IP is commonly used to refer to the Internet architecture, which has become a widely implemented standard communication protocol in Internet and Intranet technology, enabling broad heterogeneity between clients, servers, and the communications systems coupling them. IP provides a “datagram” delivery service at the network level. TCP builds a connection-oriented transport level service to provide reliable, sequential delivery of a data stream between two IP hosts. Reliability in TCP/IP transmissions is generally compromised by three events: data loss, data corruption, and reordering of data.
Data loss is managed in TCP/IP by a time-out mechanism. TCP maintains a timer (retransmission timer) to measure the delay in receiving an acknowledgment (ACK) of a transmitted segment from the receiver. When an ACK does not arrive within an estimated time interval (retransmission time-out (RTO)), the corresponding segment is assumed to be lost and is retransmitted. Further, because TCP is traditionally based on the premise that packet loss is an indication of network congestion, TCP will back-off its transmission rate by entering “slow-start,” thereby drastically decreasing its congestion window to one segment.
TCP manages data corruption by performing a checksum on segments as they arrive at the receiver. The checksum algorithm is a 16-bit one's complement of a one's complement sum of all 16-bit words in the TCP header and data. The TCP sender computes the checksum on the packet data and loads this 2-byte value into the TCP header. The TCP header's checksum field also includes a 12-byte pseudo header that contains information from the IP header. The receiver computes the checksum on the received data (excluding the 2-byte checksum field in the TCP header), and verifies that it matches the checksum value in the header.
TCP manages reordering of data or out-of-order arrival of segments by maintaining a reassembly queue that queues incoming packets until they are rearranged in sequence. Only when data in this queue gets in sequence is it moved to the user's receive buffer where it can be seen by the user. When the receiver observes a “hole” in the sequence numbers of packets received, it generates a duplicate acknowledgement (DACK) for every subsequent “out-of-order” packet it receives. Until the missing packet is received, each received data packet with a higher sequence number is considered to be “out-of-order” and will cause a DACK to be generated.
FIG. 7 is a schematic block diagram of an IP packet 100 comprising an IP header portion 110 and a payload/data portion 150. The IP header 110 comprises a version field 102 that indicates the format of the IP header, an Internet header length (IHL) field 104 that indicates the length of the Internet header and a type of service (TOS) field 106 that provides an indication of parameters of a desired quality of service. An IP total length field 108 specifies the length of the IP packet including the IP header and payload/data, while an IP identification field 110 specifies an identifying value assigned by the sending entity to aid in assembling the fragments of the packet.
The IP header further includes a more fragment (MF) flag 112, an IP fragment offset field 114 that specifies the placement of the fragment within the IP packet and a time to live (TTL) field 116 that indicates a maximum time the packet is allowed to remain in the network. A protocol field 118 indicates the next level protocol used in the payload/data portion 150 of the packet, while a header checksum field 120 provides a checksum on only the IP header. The IP header further includes a source address field 122 containing the IP source address of the sending entity and a destination address field 124 containing the IP destination address of the receiving entity, along with an options field 126 and a padding field 128.
Fragmentation of an IP datagram (hereinafter referred to as a packet) is often necessary if the LAN standards associated with the source and destination entities are dissimilar (e.g., Ethernet and Token Ring). In such a case, the routers and switches of the network may need to alter the format of the packet so that it may be received by the destination entity. For example, if a packet originates in a network that allows a large packet size and traverses one or more links or local networks that limit the packet to a smaller size, the switch interconnecting the networks must fragment the IP packet. In the context of a TCP/IP networking environment, the fragmentation and reassembly procedure is well known and described in detail in the Internet Protocol, Request for Comments (RFC) 791, by Information Sciences Institute University of Southern California (1981), which disclosure is hereby incorporated by reference. According to RFC 791, IP fragmentation apportions an IP packet into an arbitrary number of fragments that can be later reassembled.
To fragment an IP packet, either a source or an intermediate system (e.g., a switch) creates two or more new IP fragments and copies the contents of a portion of the IP header fields from the original packet into each of the IP headers of the fragments. The receiving entity of the fragments uses the contents of the IP identification field 110 (i.e., the packet identifier (ID)) to ensure that fragments of different packets are not mixed. That is, the identification field 110 is used to distinguish the fragments of one packet from those of another. The IP fragment offset field 114 informs the receiving entity about the position of a fragment in the original packet. The contents of the fragment offset field and the IP total length field 108 of each fragment determine the portion of the original packet covered by the fragment. The MF flag 112 indicates (e.g., when reset) the last fragment. The originating host of a complete IP packet sets the IP identification field 110 to a value that is unique for the source/destination address pair and protocol (e.g., TCP, UDP) for the time the packet will be active in the network. The originating host of the complete packet also sets the MF flag 112 to, e.g., zero and the IP fragment offset field 114 to zero.
The IP identification field 110 is a 2 byte field, which must wrap around (i.e., must restart numbering at 1) after reaching 65535. On a high speed network generating thousands of IP packets per second, the IP identifier (IP-ID) in field 110 can wrap around multiple times per second. For example, on gigabit Ethernet, 80,000 packets can be generated in a second, which means the wrap around of the IP-ID can occur within a second. As networks become even faster, this wrap around occurs even more frequently. For example, with 10 gigabit Ethernet, the wrap around can occur in milliseconds.
As a result, IP-ID wrap-around can be a cause of data corruption in the network if fragments belonging to a wrapped around IP-ID are reassembled with fragments of a different IP packet identified by the original IP-ID. Upper layer protocols such as TCP or UDP may not be able to detect the corruption since the Internet checksum algorithm utilizing header checksum 120 to detect corruption is not very strong. This problem is addressed in IP by the use of a reassembly timer (see RFC 791). IP fragment reassembly uses the reassembly timer to discard fragments if all fragments of an identified packet have not been received within the reassembly timer period. Many implementations of IP fragment reassembly typically use 30 seconds for the reassembly timer. In vary fast networks, the IP-ID will wrap around many times in this time interval, increasing the likelihood that fragments will be mismatched with the wrong fragments of a wrapped around IP-ID.
The solution to IP-ID wrap around in fast networks has been to set the reassembly timer to a very low value, thereby reducing the number of duplicate IP-IDs outstanding. However, this causes performance degradation in network environments with varied speeds of both fast and slow networks because fragments will be mistakenly discarded along a slow link when the IP packets merely have not yet arrived. Discarding IP fragments in slow network connections will result in unnecessary retransmissions being required from the upper layers. As can be seen, it would be desirable to provide a solution to data corruption problems caused by IP-ID wrap-around on variable-speed networks that provides improved performance over the known solutions.