1. Technical Field
The present invention relates to the exchange of messages between a plurality of interconnected computers. More particularly, the present invention relates to a method and apparatus for determining when all of the packets of a message sent by a source computer have arrived at a destination computer.
2. Description of the Prior Art
A computer network includes collections of interconnected computers, also known as multicomputers or clusters. FIG. 1 is a block diagram of a typical computer network. In the figure, several computers 19, 20 are shown connected to a network, also referred to as an interconnection fabric, which includes several network nodes 10-12, 16-18. Each node may in turn be connected to a hub 13, 15, and also to one or more routers 14, 21. In such networks, data are transferred from one computer to another in the form of messages, where each message consists of one or more fixed size or limited variable-size packets. The packets are each transmitted from a source computer to a destination computer via the interconnect fabric. Such networks vary widely in the delivery guarantees they provide. Some networks (e.g. Ethernet) drop or duplicate packets, while some guarantee each packet will be delivered exactly once. Because better performance is achievable when packets are routed around congestion or take varied or random routes to the destination, many high performance interconnects do not route all packets that comprise a message by the same route. This gives rise to the possibility that packets taking one route arrive at the destination earlier than packets that may have been transmitted earlier but took a different (and slower or more congested) route. Thus, the packets may arrive at the destination computer out of sequence.
The present invention is in the context of interconnects that deliver each packet exactly once, but make no guarantees regarding delivery order.
The receiving computer normally consists of several components, each with an expense (in time) associated with its knowing or responding to something. An interface card can know or respond to something in almost zero time. The operating system can be informed of some event and run briefly, by means of a procedure known as an interrupt. This may cost a few hundred cycles, or the operating system may have to suspend the operation of whatever program was currently run and start up the application involved with the incoming message. The action of suspending one process and starting up another is called a context switch. Context switching can take thousands of cycles.
To avoid delays between when the data arrives and when it can be used, the data from each packet are deposited directly into their final destination addresses, rather then first depositing them into temporary buffers, from which they may be copied to a final destination. In addition the destination computer must be notified when all the packets comprising a message have arrived.
The problem is to determine when this has occurred.
This problem can be subdivided in several cases. The first is the case where this is only one sender, hereinafter referred to as the single-sender problem. Several solutions to the single-sender problem have been proposed, including:
1. Place a flag in the last packet of a message indicating that it is the last packet. This solution only works with systems that guarantee in-order delivery of packets. As discussed above, many interconnect fabrics do not guarantee in-order delivery. PA0 1. Require the receiving computer to know in advance how many packets are sent. PA0 1. Place a value equal to the total of the number of packets sent for a particular message in the first packet, then count down at the receiving computer as each packet is received until the count is equal to zero. Unfortunately, there are two problems with this solution: PA0 1. Providing a counter at the destination computer that is initialized to the number of sources (i.e. source computers) from which messages are received. The choice of interconnect fabrics is then limited to those that guarantee in-order delivery. Each source computer sets a last-packet flag in the last packet of each message. Every time the destination computer receives a packet in which the last-packet flag is set, it decrements the counter. When the counter reaches zero, the destination computer is notified that all packets of all messages have arrived. PA0 2. Requiring the destination computer to know in advance how many packets are sent. PA0 3. Providing a vector containing a counter for each source computer. Each source computer then sends the first packet of its message, which includes a number that indicates the total number of packets in the message. It must be assured, for example by one of the methods described above for the single-sender case, that the first packet arrives before any of the other packets from the same source computer. As each packet arrives, its counter is decremented. When all the counters are zero, the destination computer is notified. In addition to those limitations described above, an additional disadvantage of this approach is that the maximum possible number of source computers is limited by the size of the vector of counters. PA0 4. The destination computer receives the message from each source computer separately, using known methods. The destination computer counts the incoming messages and when the requisite number have arrived it commences processing.
Place this value in a counter and decrement the counter as each packet arrives until the value in the counter equals zero. This solution does not allow the transmission of data whose size is not known in advance by the receiving computer.
a. The computer network must guarantee that the first packet actually arrives at the destination first, either by limiting the interconnect fabric to one that guarantees in-order delivery; or by having the receiving computer acknowledge receipt of the first packet, in which case the sending computer must wait for this acknowledgement before sending subsequent packets. This latter approach is undesirable because it introduces additional delay and complexity into the computer network; and PA1 b. The computer network must determine the total number of packets that are sent before the first packet is launched.
A second problem, hereinafter referred to as the multiple sender problem, occurs in a computer network in the situation in which one computer receives messages from multiple source computers, and must determine when all of the packets comprising all these messages have arrived. The number of source computers may be known in advance by the destination computer, but each message may consist of a different number of packets. Several solutions to the multiple sender problem have been proposed, including:
Place this value in a counter and decrement the counter when each packet arrives at the destination computer until the value in the counter equals zero. Unfortunately, this approach does not allow the transmission of data whose size is not known in advance.
Because the destination computer must context switch to accept each message, this solution requires an excessive number of context switches at the destination computer.
A third problem, referred to hereinafter as the dynamic sender problem, occurs in a computer network when the number of source computers is not known by any one node. Rather, each receiving node knows of some set of sending nodes, but these nodes may have delegated responsibility for part of the application to yet other nodes.
These other nodes also have to send messages to the destination computer, and this can proceed recursively.
The obvious solution to the dynamic sender problem is to have each node that has delegated responsibility to other nodes collect the data from those nodes and forward this data to the destination computer. This can be done by any of the multiple sender techniques listed above. Alternatively, each node that delegates responsibility to other nodes can forward a list of those nodes to the node from which it received its responsibility. Eventually, the top level nodes forward the information to the destination computer, which can then use any of the techniques described above in connection with the multiple sender problem.
One proposed solution to the problem of assembling received packets into a message is described in connection with the Internet by D. Clark, Datagram Reassembly Algorithms, MIT Laboratory for Computer Science, Computer Systems and Communications Group, RFC: 8151P, July 1982. The Internet consists of a collection of many computer networks, each of which has its own characteristics.
When an Internet Protocol (IP) packet is generated, it may have some size corresponding to the capability of the computer network in which it was generated. As the IP packet is routed to its destination, it may have to pass through another computer network that is not able handle packets that large, so it is fragmented into smaller packets. There is no requirement in the Internet Protocol that such fragments do not overlap.
Clark discloses a technique that determines when all the packets comprising a full Internet packet have arrived. First, it is necessary to keep track of all the fragments.
Second, when a new fragment arrives, it may be combined with the existing fragments in a number of different ways. For example, it may precisely fill the space between two fragments, it may overlap with existing fragments, it may completely duplicate existing fragments, or it may partially fill a space between two fragments without abutting either of them. Thus, reassembly involves a complicated algorithm that tests for a number of different options.
A partially reassembled message consists of certain sequences of fragments that have already arrived, and certain areas that are to contain fragments still to come. These missing areas are referred to as holes. Each hole can be characterized by two numbers: the number of the first fragment in the hole, and the number of the last fragment in the hole. This pair of numbers is referred to as the hole descriptor. All of the hole descriptors for a particular message are gathered together in a hole descriptor list.
The general form of Clark's algorithm is as follows: When a new fragment of the message arrives, it can possibly fill in one or more of the existing holes. Each of the entries in the hole descriptor list is examined to see whether the hole in question is eliminated by this incoming fragment. If so, that entry is deleted from the hole descriptor list. Eventually, all of the fragments necessary to complete the message have arrived, such that every entry is eliminated from the hole descriptor list. At this point, the message has been completely reassembled and can be passed to higher protocol levels for further processing.
See, also M. C. Lobelie, Datagrams Transferred As Message Trains In Local Area Networks, Interfaces in Computing, 2 (1984) pps. 131-146, which discloses a method for guaranteeing the integrity of a packet transfer sequence, where both the sender and the receiver know the structure of the message before the message is sent. The received packets are counted and an acknowledgment packet is sent to the sender by the receiver for an entire message.
It would be advantageous to provide a method and apparatus that determines when all of the packets in a message have been received, but that is not subject to the limitations of the known techniques discussed above.