None
Not Applicable
The present invention is related to the field of network communications, and more particularly to techniques for maintaining the ordering of packets in a stream of packets being sent to a destination network node from a source network node.
A common requirement for network communications is that packets or frames being transmitted by a source node arrive at a destination node in the same order as transmitted by the source node. For example, the contents of a long message may be distributed among several packets, and coherent parsing of the message at the destination node depends upon the ability to correctly re-assemble the message from its constituent packets. Alternatively, a message conveyed in one packet may provide information about a later-transmitted or earlier-transmitted message, and the correct identification of the related message relies upon receiving the related packets in the same order as originally transmitted.
One reason why it is possible for packets to be delivered out of order is the use of different mechanisms for providing packet-forwarding services in network devices such as switches. In particular, a change from one service mechanism to another can potentially result in out-of-order delivery. The problem arises when the new service is capable of delivering packets faster than the previous service, at least in the period right after the transition. One or more packets handled by the new service are delivered before previously transmitted packets still being processed by the previous service.
A known example of such a service transition can occur when a network device such as a switch learns the location of a network node that is receiving a stream of packets. When the switch does not have a valid mapping of a destination node address to an output port of the switch, the switch employs a technique known as xe2x80x9cmulticastingxe2x80x9d to transmit packets toward the destination node. Packets received at an input port of the switch are placed on a multicast queue at the input port, and from the multicast queue the packets are forwarded to multiple output ports. Once the switch establishes a port mapping for the destination node, however, subsequently received packets are placed on a unicast queue specifically associated with the output port at which the destination node is known to be reachable. Under certain conditions, packets from the unicast queue are forwarded to the output port before previously transmitted packets waiting on the multicast queue. This operation can result in out-of-order delivery of a stream of transmitted packets.
One prior approach to maintaining packet ordering is the use of xe2x80x9cflushxe2x80x9d protocols. When the conditions for a service transition are detected, a special packet known as a xe2x80x9cflushxe2x80x9d packet is sent through the previous service, and the new service is stalled until the flush packet or packets are looped back. In this manner it is guaranteed that any previously-transmitted packets being processed by the previous service have been delivered before any packets are delivered by the new service.
While flush protocols are effective in maintaining packet ordering, they suffer from undesirable drawbacks. The flush packets themselves consume valuable network resources, and the looping back involves delay and requires the assistance of an external device. It would be desirable to maintain packet ordering in networks without incurring the resource and other penalties of flush protocols.
In accordance with the present invention, a packet order assurance mechanism is disclosed that operates without the need for explicit flush packets and their attendant delay and resource consumption.
According to the disclosed technique, each received packet or message is marked upon being queued to either a first queue or a second queue for forwarding to an output port. The marking indicates a service era during which the message is being queued. The service era is advanced whenever a service transition occurs and the first queue is non-empty. In one embodiment the first queue is a multicast queue used when a valid port mapping for the destination address of the packet does not exist.
The presence of packets on the first queue and the transfer of packets from the first queue to the output ports are monitored. A message is forwarded from the second queue to an output port only if either the first queue was empty at the time of the transition to the service era in which the message was queued to the queue, or if all messages on the multicast queue from a service era earlier than the service era of the message on the unicast queue are guaranteed to have been transferred to the output port. In this way, it is guaranteed that no earlier-transmitted packets are present on the multicast queue when a packet is forwarded from the unicast queue.
In a disclosed embodiment, a synchronization flag variable is used to track the contents of the multicast queue. The synchronization flag is updated whenever a packet is forwarded from the multicast queue. The synchronization flag is updated to either the current service era or the service era of the packet being forwarded. The synchronization flag is used to control the forwarding of packets from the unicast queue. In particular, a packet on the unicast queue must be marked with an era no later than the value of the synchronization flag in order to be forwarded.
Other aspects, features, and advantages of the present invention are disclosed in the detailed description that follows.