The present invention is directed to the problem of detecting duplication of packets traversing various paths in a network of data processing nodes.
It is highly desirable that application programs operating in many areas such as numerically intensive computing (NIC) be provided with interface mechanisms to be able to detect packet duplication in a data processing network. NIC application programs in particular operate in a fashion in which partial results are transmitted amongst the nodes dedicated to solving various problems. Accordingly, NIC application programs and similar programs for which interchange of data packets drives efforts to improve performance would benefit from methods that promote rapid transfer of data packets without the worry of packet duplication amongst the nodes. Many applications depend on guaranteed only once notification of message transfer completion and that unwanted override of data buffers will not occur after a notification has been signaled to the ULP (upper layer protocol) or the end user.
For example, duplication may occur during the time when a network detects a cycle and modifies the links that make up the spanning tree, which is a mechanism used for routing packets in a network. The term “spanning tree” is a graph theory concept that describes a connected subset of the interconnection graph for a set of nodes in the data processing system in which all nodes and links are present but which has no closed loops. It is noted that, for any given interconnection graph, the selection and/or determination of a spanning tree is not unique. Presently networks such as Ethernet, which is one of several protocols used for data packet transmission, is one such network that may use spanning trees to determine data paths for the transmission of data packets. The spanning tree approach is employed to ensure that the same packets are not accepted by the destination network adapter twice. However, as the network topology changes (as it might as nodes are added to or dropped from a node set), there may be certain periods of time where one or more transient operating cycles may occur during which a data packet reaches a destination node more than once until the cycle is detected by the routing mechanism and certain routes are deleted to ensure that a properly formed, new spanning tree is put into place. This problem is more particularly discussed below with reference to FIGS. 1 through 5.
An example of an application that requires NIC level duplicate packet detection is the RDMA (Remote Direct Memory Access) transport protocol, where the Upper Layer Protocol (ULP) protocol expects there to be no duplication of packets in the network in order to work correctly (U.S. Pat. No. 7,478,138 entitled “Third Party, Broadcast, Multicast and Conditional RDMA Operations,” issued Jan. 13, 2009). For RDMA operation, the challenge is to effectively detect the duplication at the receiving side network adapter and to not post a completion notification for the message transfer when duplication of a packet of a message occurs in the network. One cannot depend on the Upper Layer Protocol to detect and discard the duplicate packets because the ULP is not engaged in parsing each packet of an RDMA message (it is done by the network adapter). The receiving side network adapter directly moves the packets of an RDMA message to the target buffer. An additional challenge is to keep the logic that is employed to determine duplication of packets very simple and fast without requiring complex state maintenance on the adapters. In addition, one should try to ensure that the transport can take advantage of physical switches with multiple routes between a pair of nodes; the transport should not require in-order-delivery of packets.
Others have tried to solve the problem of duplicate data packet transport in different ways. For example, in U.S. Pat. No. 7,400,626 entitled “Processing a Duplicate Data Packet,” issued Jul. 15, 2008, there is described a method for detecting duplicate packets by checking a timestamp in the packet against the timestamp of the last good packet received and checking an event bit that indicates whether the device is in the active or inactive state. This is quite different than the present invention since the present invention involves no time stamps; however, the basic problem being solved is similar.
Additionally, the problem being addressed in U.S. Pat. No. 7,406,082 entitled “Sequence Number Schemes for Acceptance/Rejection of Duplicated Packets in a Packet-Based Network,” issued Jul. 29, 2008, is also similar to the problem being addressed by the present invention. However, what is quite different is the notion of employing sequence number schemes for the acceptance and/or rejection of duplicated packets in a packet based transmission environment. Also, in U.S. Patent Publication No. 2005/0078653 A1, entitled “A Method and Apparatus for Data Communications Over Multiple Channels,” published Apr. 14, 2005, the authors therein describe an approach that is directed to the communication of data over multiple channels using a method that uses sequence-number based duplication as a foundation for “filtration” (that is, elimination) of duplicate packets. By way of contrast, however, the problems associated with the storage requirements needed for sequence-number based filtration are precisely what has motivated the development of the present. Other than the discussions therein related to packet duplication detection the published patent application bearing U.S. Patent Publication No. 2005/0078653 A1 is not germane to the present application.
The work described in U.S. Pat. No. 6,167,051 entitled “Network Node and Method of Packet Transfer,” issued Dec. 26, 2000, concerns scheduling and routing of multicast traffic and a means therefor for avoiding the generation of duplicate packets. The work does not concern the filtration of duplicates data packets and is not germane to the present application.
The work described in U.S. Pat. No. 6,853,641 entitled “Method of Protecting Traffic in a Mesh Network,” issued Feb. 8, 2005, describes the purposeful transmission of duplicate packets to ensure high reliability over a network and the marking of the packets with sequence numbers so that the receiver can discard duplicates and recreate the original packet stream. As such, it is not only significantly different than the present invention, it actually teaches away from the main principles of the present invention, namely, the avoidance of duplicate data packet generation and transmission.
U.S. Pat. No. 5,610,595 entitled “Packet Radio Communication System Protocol,” issued Mar. 11, 1997, describes an approach to packet duplication detection based on a repeat count in the transmitted packet. Apart from this as being a significant difference, the radio aspects of this system do not suggest one of the major causes for data packet duplication discussed herein, namely changes in the number and connections of nodes in the network.
U.S. Pat. No. 6,671,264 entitled “Method for Detecting Invalid Packets by Assigning Super Transaction Number,” issued Dec. 30, 2003, also is inapposite to the present invention, not only since it uses transaction numbers (similar to the sequence number approach) but also because it seeks in its operation to stifle the transmission of duplicate data packets at the source, as opposed to the problem arising from changes in the network interconnection graph structure.
The mechanism described in the present application is also different than the standard sliding window based protocols used to detect ghost and duplicate packets. In addition, the duplicate packets in that approach are inserted by the sender after a predetermined timeout. The duplicate packets inserted by the sender also have a special bit set to signify that it is a duplicate transmission which the receiver uses to check for duplicate detection purposes.
There are many differences between the invention described herein versus what is provided in the prior art. The present invention has the following superior attributes:
1. The receiving side does not require that duplicate packets be marked with a special bit.
2. There is no need for a sliding window protocol or the associated state maintenance. This mechanism eliminates the need for the receiving side to send periodic acknowledgements to ensure that the flow control window on the send side can be advanced.
3. In addition this does not unnecessarily throttle the send side if the acknowledgements from the receive side are delayed as often occurs in standard sliding window protocols.
4. In U.S. Pat. No. 7,406,082 referenced above, the receiver has to keep a timer to figure out if a packet has aged in the network beyond the expected time if it arrives out of sequence or if the sequence number of the arriving packet is less than that of the last sequence number that was received. No such expected time based checking is needed on the receiving side in the present invention.
5. The overall efficiency of the present method, in terms of the order of instructions and the amount of necessary state information is far superior as compared to other approaches.