1. Field
The following description relates to a computer network, and more particularly, to a load balancing when processing packets in a multi-processor system.
2. Description of the Related Art
Networking devices, for example, routers, switches, firewalls, and middle boxes, have evolved to accommodate a variety of functionalities at a wide bandwidth. Multi-core processors or processor arrays that not only provide some level of programmability to meet the functional requirement but also leverage parallelism to meet the performance requirement have been widely equipped in such devices. Efforts to maximize and improve parallelism has been developed into a scheme of dividing an input packet stream to flows, that is, independent sets of packets that do not require synchronization or context sharing with respect to other groups.
A flow distribution model using hash tag is one of the most widely used approach of the above scheme. A tag value is calculated for each of ingress or egress input packets by use of a general hash function, and the calculated tag value is utilized as an index of a core or a processor such that the packet is forwarded to a core or a processor having the corresponding index. The uniformity of hash functions provides that flows are distributed across the processing engines with roughly the same probability; the processing engines are evenly balanced in terms of the number of flows.
In balancing the flow count in a networking device, there is a pitfall of inconsistency between the number of flows and the number of packets belonging to the flow. That is, even if the processing engines are completely balanced in term of the flow count, the packet count or the byte count may be severely unbalanced across the processing engines to which packets are allocated.
Traffic bursts occurring in a short time scale, such as RTT (Round Trip Times), that is, traffic bursts occurring due to load imbalance in the processing engine during several hundred milliseconds or less can be effectively handled using a packet buffer with tolerance of some delay or can be prevented by overprovisioning the capacity of processing engines. However, in order to cope with persistent overload due to the load imbalance occurring in a non-stationary manner in a large time scale, an alternative to the distribution scheme is required.
The extent of load imbalance can be also greater in a large time scale than a small time scale. The distribution of flow size, which is known to be heavy-tailed or Pareto-like, has a heavier tail than the distribution of flow rate, which is often observed to be consistent to the Lognormal distribution. The distribution of packet load to be processed at each engine in a time window larger than a general flow lifetime approximates to the distribution of flow size while the distribution of packet load in a small time window approximates to the distribution of flow rate. As a few very large flows, account for a significant portion of the entire traffic, the flow count based balancing does not implicate the number of packets or bytes being roughly balanced.