This invention relates to a mechanism for dispatching packets via a telecommunications network and to a network sender or router station, or node, incorporating such a mechanism.
The invention finds particular application to transmission of TCP information via an inter-or intra-network operating under an Internet Protocol.
FIG. 1 is a schematic representation of an instance of an Inter- or Intra-net with a router 10 being provided in the path between a source 12 and a destination 14. Between the source 12 (or sender node) and the router node 10, a net 16 is shown and between the router node 10 and the destination node 14 a further net 18 is shown. In practice, the net 16 and the net 18 can be one and the same and the router 10 effectively forms a xe2x80x9cstaging postxe2x80x9d between the source 12 and the destination 14. In the following, reference is made to a dispatch mechanism. It should be appreciated that the dispatch mechanism could, in the present context, equally form part of the source 12, as opposed to being a separate xe2x80x9cstaging postxe2x80x9d as illustrated in FIG. 1.
FIG. 2 is a schematic representation of the configuration of a station for a router 10 or source or destination 12, 14. These stations can be implemented using any appropriate technology. However, as illustrated in FIG. 2, the station is implemented by a server computer 20 comprising a system unit 22, optionally with a display 38, keyboard 40 and other input devices 42. It should be noted that the router 10 need not include a keyboard, display, etc. FIG. 2A is a schematic block representation of aspects of the contents of the system unit 22. As illustrated in FIG. 2A, the system unit includes a processor 28, memory 30, disk drives 24 and 26, and a communications adaptor 32 for connection to one or more telecommunications lines 34 for connection to the telecommunications network 16/18. As illustrated in FIG. 2A, the components of the system unit are connected via a bus arrangement 36. It will be appreciated that FIGS. 2/2A are a general schematic representation of one possible configuration for a server computer for forming a router or sender or destination station and that many alternative configurations could be provided.
Conceptually, the Internet provides three levels of services. At the lowest level, a connectionless delivery system provides a foundation on which everything rests. At the next level, a reliable transport service provides a high level platform. At the third level, application services are provided which rely on the reliable transport service.
A fundamental Internet service consists of an unreliable, best-effort, connectionless, packet delivery system. The service is described as being xe2x80x9cunreliablexe2x80x9d because delivery is not guaranteed. A packet may be lost, duplicated, or delivered out of order, but the Internet will not detect such conditions, nor will it inform the sender or receiver. The service is described as being xe2x80x9cconnectionlessxe2x80x9d because each packet is treated independently from all others. A sequence of packets sent from one machine to another may travel over different paths, or some may be lost while others are delivered. The service may be described as xe2x80x9cbest-effortxe2x80x9d because the Internet makes an earnest attempt to deliver packets.
The protocol that defines the unreliable, connectionless, delivery mechanism is called the xe2x80x9cInternet Protocolxe2x80x9d, and is usually referred to by its initials IP. IP defines the formal specification of data formats, including a basic unit of data transfer and the exact format of all data passing across the Internet. IP also includes rules which specify how packets should be processed and how errors should be handled. In particular, IP embodies the idea of unreliable delivery and packet routing.
Further details of aspects of the Internet and TCP/IP protocols may be found, for example, in the following U.S. Pat. Nos. 5,293,379; 5,307,347; 5,307,413; 5,309,437; 5,351,237; and 5,535,199.
The basic unit of data transfer via the IP is termed an xe2x80x9cInternet datagramxe2x80x9d, or alternative xe2x80x9cIP datagramxe2x80x9d, or simply xe2x80x9cdatagramxe2x80x9d. A datagram comprises header and data areas, and source and destination addresses. There is no fixed size for a datagram. Bearing this in mind, and also the physical constraints of the underlying hardware services on which the Internet is based, it is necessary to divide the datagram into portions called xe2x80x9cfragmentsxe2x80x9d.
FIG. 5A illustrates the format of an Internet datagram. The same format is used for a fragment of a datagram.
The 4 bit version field (VERS) specifies the IP protocol version and is used to ensure that all of the nodes along the path of the datagram agree on the format.
The LEN field gives the datagram header length measured in 32 bit words. The TOTAL LENGTH field gives the length of the IP datagram measured in octets including the length of the header and data.
The SERVICE TYPE field contains handling details for the datagram.
Three fields in the datagram header, IDENT, FLAGS, and FRAGMENT OFFSET, control fragmentation and reassembly of datagrams. The field IDENT contains a unique identifier that identifies the datagram.
In the FLAGS field, a first bit specifies whether the datagram may be fragmented, and a second bit indicates whether this is the last fragment in the datagram. The FRAGMENT OFFSET field specifies the offset of this fragment in the original datagram, measured in units of 8 octets, starting at offset zero.
As each fragment has the same basic header format as a complete datagram, the combination of the FLAGS and FRAGMENT OFFSET fields are used to indicate that the headers relate to fragments, and to indicate the position of the fragment within the original datagram. The FRAGMENT OFFSET field identifies the position within the datagram, and the second of the FLAGS bits mentioned above (which is sometimes called the MORE FRAGMENTS flag) is used to indicate whether there are any more fragments in the datagram, or conversely that the fragment concerned is the last fragment of the datagram.
The field PROTO is a form of type field. The HEADER CHECK SUM figure ensures integrity of header values.
SOURCE IP ADDRESS and DESTINATION IP ADDRESS contain 32 bit Internet addresses of the datagram""s sender and intended recipient. The OPTIONS field and the PADDING field are optional in the datagram. The field labelled DATA represents the beginning of the data field.
As mentioned above, above the IP layer of the Internet protocol structure one service which is provided is a reliable transport service which is typically called the xe2x80x9creliable stream transport servicexe2x80x9d, defined by the Transmission Control Protocol (TCP). Although TCP is provided over the Internet, it is in fact an independent general purpose protocol which can also be used with other delivery systems. TCP makes very few assumptions regarding the underlying network, and it can also be used over a single network like Ethernet, as well as over a complex Internet, or Intranet.
TCP provides a reliable stream delivery service which can be contrasted with the unreliable datagram protocol (UDP) which is also provided over the Internet. Whereas UDP provides an unreliable delivery service because delivery is not guaranteed, TCP provides a more complex structure which does ensure reliable delivery in the form of a stream.
UDP provides unreliable packet delivering, whereby packets may be lost or destroyed when transmission errors interfere with data, when network hardware fails, or when networks become too heavily loaded to accommodate the load presented. TCP on the other hand, operates by providing delivery by means of a stream of bits, divided into eight-bit octets or bytes.
Given that the underlying Internet protocol is unreliable, TCP transmissions operate in accordance with a technique known as positive acknowledgement with retransmission. The technique requires a recipient to communicate with the source, sending back an acknowledgement message every time it receives data. The sender keeps a record of each packet that it sends and waits for an acknowledgement before sending the next packet. The sender also starts a timer when it sends its packet and retransmits a packet if the timer expires before the acknowledgement arrives. FIG. 3A is a schematic representation of the transmission and receipt of packets and acknowledgements. The left hand side of FIG. 3A represents events at a sender side 50, the right hand side represents events at a receiver side 52 and the middle portion represents network messages passing between the sender and the receiver.
At 54, the sender 50 (eg, the router 10) sends a packet P1 to the receiver (eg, the destination 14) via the network and starts a timer for message P1. When the receiver 52 receives, 56, the packet P1; the receiver then sends, 58, an acknowledgement A1. When the acknowledgement Al is received, 60, at the sender 50, the sender can cancel the timer and send 62, the next packet P2 to the receiver 52 setting a timer for the message P2. When the receiver 52 receives, 64, the packet P2, it sends 66, a second acknowledgment A2, to the sender 50. Once again the sender can cancel the timer. The process then continues with the transmission of further packets on receipt of the second acknowledgement A2.
The process illustrated in FIG. 3A, is a representation of the system operating properly with responses received within an expected time (RTT or round-trip-time). The RTT concept will be described later. However, FIG. 3B illustrates what might happen when a packet is not received (for example because a packet is lost).
In FIG. 3B, a packet is sent at 70 and a timer (RTT timer) is started. A packet P1 is lost in transmission between sender 50 and receiver 52. Accordingly, the packet is not received at the receiver when it should have been at time 72. Accordingly, no acknowledgement is sent as should have occurred at 74. Likewise, an acknowledgement is not received at the sender 50 when it should have been at 76. At 78 the RTT timer times out indicating that a packet has been lost. Accordingly, the sender retransmits packet 1 as P1xe2x80x2 at 80. This is then successfully received at the receiver 52 at 82, which returns at 84 the acknowledgment Axe2x80x22 to the sender which is received at 86.
The basic transfer protocol described with reference to FIGS. 3A and 3B above, has the disadvantage that an acknowledgement must be received before a further packet can be sent. In order to increase the dataflow, an Internet stream service can employ a concept known as a xe2x80x9csliding windowxe2x80x9d. The sliding window approach is to enable a sequence of packets to be transmitted before receiving an acknowledgement. The number of packets which can be transmitted before receiving an acknowledgement is defined by the number of packets within the xe2x80x9cwindowxe2x80x9d. Accordingly, for a sequence of packets 1-6, a window might extend from packet 1 to packet 3. Accordingly, all of the first three packets can be transmitted without waiting for an acknowledgement. However, packet 4 can only be transmitted when an acknowledgement has been received for packet 1. On receipt of the acknowledgement for packet 1, packet 4 is then sent. At this stage packet 5 cannot be sent until an acknowledgement has been received from packet 2. It can be seen therefore that the window effectively slides along the sequence of packets as acknowledgements are received. A sliding window protocol remembers which packets have been acknowledged and keeps a separate timer for each unacknowledged packet. If a packet is lost, the timer expires and the sender retransmits that packet. As the sender slides its window, it moves past an acknowledged packet. At the receiving end, a similar window is maintained, for accepting and acknowledging packets as they arrive. It will be appreciated that the protocol is relatively complex, but does provide for more efficient transfer. FIG. 4 is a schematic representation of the exchange of packets for a sliding window of size 3. This shows how the window W slides along the list of packets.
The present invention finds application to a reliable stream service such as that provided by the Internet. This service is defined by the Transmission Control Protocol, or TCP. The combination of the TCP protocol and the underlying Internet protocol (IP) is often referred to as TCP/IP.
TCP specifies the format of the data and acknowledgements that two computers are to exchange to achieve reliable transfer, as well as the procedure to ensure that data arrives correctly. The TCP Protocol assumes very little about the underlying communication system and can be used with a variety of packet delivery systems including the IP datagram delivery service. The TCP service resides above the IP layer which in turn lies above the network interface of the Internet.
FIG. 5B represents the format of a segment used to communicate between two nodes under the TCP. Each segment is divided into two parts, a header followed by data. The header comprises SOURCE PORT and DESTINATION PORT fields containing the TCP PORT numbers that identify the application programs at the end of the connection. The SEQUENCE NO. identifies the position in the sender""s byte stream of the data in the segment. The ACKNOWLEDGEMENT NO. field identifies the position of the highest byte that the source has received. The SEQUENCE NO. refers to the stream flowing in the same direction as the segment, while the ACKNOWLEDGEMENT NO. refers to the stream flowing in the ,opposite direction. The OFF field contains an integer that specifies the offset of the data portion of the segment. This is needed because the OPTIONS field varies in length. The field RES is reserved for future use. Segments can be used to carry an acknowledgement or data or requests to establish or close a connection. The CODE field is used to determine the purpose and content of the segment. The WINDOW field specifies the buffer size that the destination is willing to accept every time it sends a segment. The CHECK SUM field includes a TCP header check sum. The URGENT POINTER field is used for identifying urgent data.
The OPTIONS field is used to communicate information with the destination. For example, the OPTIONS field can be used to specify a maximum segment size. The DATA indication represents the start of the data field of the segment.
As the TCP sends data and variable length segments, acknowledgements necessarily refer to a position in the stream, and not to packets or segments. Each acknowledgement specifies one greater than the highest byte position that has been received. Accordingly, acknowledgements specify the number of the next byte that the receiver expects to receive.
Reference has been made above to the round-trip time (RTT). This represents the average round time for the transmission of a segment until receipt of the corresponding acknowledgement. The RTT time needs to be set dynamically as the round-trip time can vary over time. FIG. 6 is a schematic representation of the way in which RTT may vary in response to an event Hi. Although the RTT may increase dramatically the algorithm used to actually generate the response time within the system, can vary more slowly. As a result the RTT and response curves will diverge, at least for a time.
A consequence of variations in network load and of the queuing of packets by routers and sending stations is that the actual RTT can increase, due to the time that the packet is held in the queue. As a result, it is possible that unnecessary retransmission of packets can occur where an acknowledgement has not been received. This is represented schematically in FIG. 7. It can be seen in FIG. 7 that due to the delayed transmission of packet P2, message P2 is retransmitted before receipt of the acknowledgement A2. As a result of the unnecessary retransmission of the packet P2, this leads to an unnecessary increase in the traffic capacity over the network which can aggravate congestion on the network.
In summary, therefore, the TCP layer includes a retransmission mechanism to recover from the loss of data on the underlying network. The interval between retransmissions is dynamically calculated by the TCP layers so as to adapt it to the response time of the network. However, when the load on the network increases, the TCP layer cannot adapt its retransmission as fast as the response time of the network increase. As a result, the TCP layer retransmits packets when this is not actually necessary because the lack of an acknowledgement is not due to non receipt of the packet, but merely to delayed receipt thereof. The effect of retransmissions is to cause yet more traffic on the network thereby once again increasing the response time of the network. This effect is well known in the Internet community and is typically caused xe2x80x9ccongestion collapsexe2x80x9d.
Accordingly, it is an aim of the present invention to address this problem.
In accordance with an embodiment of the invention, there is provided a mechanism for dispatching a sequence of packets via a telecommunications network, which dispatching mechanism comprises a queue for packets for transmission and a queue controller responsive to receipt of a new packet for transmission to compare parameters of the new packet to parameters of any packet already in the queue, the queue controller determining whether to queue or to drop the new packet depending on the result of the comparison(s).
By comparing a new packet to packets already queued for transmission, unnecessary duplicated transmission of a packet can be avoided where packet transmission has been delayed, for example due to network congestion. Avoiding retransmission of the queued packet avoids aggravating the network congestion. Where the new packet is a retransmission of the queued packet, then retransmission would be unnecessary as it is known that the queued packet has not been lost, but has merely been delayed.
Preferably, the queue is implemented as a linked list structure as this provides a flexible mechanism for allowing changes in sizes to the queue and the addition and deletion of queue entries. Preferably the linked list comprises entries containing information relating to the packet flow as well as packet identity information and a separate pointer to the packet itself. This also allows the queue controller to readily traverse the queue to perform the comparison(s) referred to above
Preferably, the queue controller is arranged to compare flow parameters of the new and the queued packet(s) including source and destination parameters to establish whether the new and queued packets relate to the same packet flow. In a TCP environment, the source parameters can comprise a source IP address and a source TCP port and the destination parameters can comprise a destination IP address and a destination TCP port.
In a preferred embodiment, the queue controller is further arranged to compare packet sequence numbers and/or acknowledgement numbers for the new packet and the queued packet(s) to establish whether the new packet is a retransmission of a queued packet.
In a preferred embodiment for a TCP environment, the queue controller is arranged to determine that a new packet is a retransmission of a queued packet if:
i) the new packet sequence number equals the queued packet sequence number; and
ii) the new packet acknowledgement number is less than the queued packet acknowledgement number.
The queue controller is arranged to add the new packet to the queue when it is determined that the new packet is not a retransmission of a queued packet.
In a preferred embodiment of the invention, the queue controller is arranged to drop a new packet when it is determined that the new packet is a retransmission of a queued packet and the length of the queued packet is greater than or equal to that of the new packet. It is also arranged to replace a queued packet in the queue by the new packet when it is determined that the new packet is a retransmission of the queued packet and the length of the new packet is greater than that of the queued packet.
The dispatch mechanism can be implemented by means of software operating on computer hardware.
In accordance with another aspect of the invention, there is provided a station for sending a sequence of packets via a telecommunications network, the station including a dispatch controller comprising:
a dispatch queue for packets;
a queue controller arranged to compare flow and packet sequence parameters of a new packet for dispatch to flow and packet sequence parameters of queued packets and arranged to respond to detection of the new packet being a retransmission of a queued packet relating to the same flow path to discard either the new packet or the queued packet. The station can, for example, be a router for routing a sequence of packets via the telecommunications network.
In accordance with a further aspect of the invention, there is provided a method of managing the dispatch of a sequence of packets via a telecommunications network, the method comprising:
queuing packets for transmission;
comparing flow and packet sequence parameters of a new packet for transmission to flow and packet sequence parameters of queued packets; and
responding to detection of the new packet being a retransmission of a queued packet relating to the same flow path to discard either the new packet or the queued packet.
In accordance with a further aspect of the invention, there is provided a software dispatch mechanism on a storage medium for controlling the dispatch of a sequence of packets via a telecommunications network, the software dispatch mechanism being configured to be operable to define:
a queue for packets for transmission; and
a queue controller responsive to receipt of a new packet for transmission to compare parameters of the new packet to parameters of a packet already in the queue, the queue controller determining whether to queue or to drop the new packet depending on the result of the comparison(s).