In large networks having multiple interconnected devices, traffic between source and destination devices typically traverses multiple hops. In these networks, devices that process and communicate data traffic often implement multiple equal cost paths across which data traffic may be communicated between a source device and a destination device. In certain applications, multiple communications links between two devices in a network may be grouped together (e.g., as a logical trunk or an aggregation group). The data communication links of an aggregation group (referred to as “members”) may be physical links or alternatively virtual (or logical) links.
Aggregation groups may be implemented in a number of fashions. For example, an aggregation group may be implemented using Layer-3 (L3) Equal Cost Multi-Path (ECMP) techniques. Alternatively, an aggregation group may be implemented as a link aggregation group (LAG) in accordance with the IEEE 802.3ad standard. In another embodiment, an aggregation group may be implemented as a Hi-Gig trunk. As would be appreciated by persons of skill in the art, other techniques for implementing an aggregation group may be used.
In applications using multiple paths between devices, traffic distribution across members of the aggregate group must be as even as possible to maximize throughput. Network devices (nodes) may use load balancing techniques to achieve distribution of data traffic across the links of an aggregation group. A key requirement of load balancing for aggregates is that packet order must be preserved for all packets in a flow. Additionally, the techniques used must be deterministic so that packet flow through the network can be traced.
Hash-based load balancing is a common approach used in modern packet switches to distribute flows to members of an aggregate group. To perform such hash-based load balancing across a set of aggregates, a common approach is to hash a set of packet fields to resolve which among a set of possible route choices to select (e.g., which member of an aggregate). At every hop in the network, each node may have more than one possible next-hop/link that will lead to the same destination.
In a network or network device, each node would select a next-hop/link based on a hash of a set of packet fields which do not change for the duration of a flow. A flow may be defined by a number of different parameters, such as source and destination addresses (e.g., IP addresses or MAC addresses), TCP flow parameters, or any set of parameters that are common to a given set of data traffic. Using such an approach, packets within a flow, or set of flows that produce the same hash value, will follow the same path at every hop. Since binding of flows to the next hop/link is fixed, all packets will traverse a path in order and packet sequence is guaranteed. However, this approach leads to poor distribution of multiple flows to aggregate members and causes starvation of nodes, particularly in large multi-hop, multi-path networks (e.g., certain nodes in a multi-hop network may not receive any data traffic), especially as one moves further away from the node (called root node) at which the traffic entered the network.
What is therefore needed are techniques for providing randomization and improved distribution to aggregate members.
The present invention will be described with reference to the accompanying drawings. The drawing in which an element first appears is typically indicated by the leftmost digit(s) in the corresponding reference number.