With the growing need for network bandwidth and increasing complexity of the services provided by a Routing and Services system, the demand for processing power in a networking system has grown to a point where a single processor can no longer support the full network load. To scale the throughput of the system, Routing and Services systems consisting of multiple processors generally use a packet distribution unit to load balance network traffic among the processors.
If the services provided by a networking system requires it to maintain state of a flow across packets, a flow state is created during the processing of the first packet of the flow and the state is saved in the RAM for subsequent packets. After the flow is terminated and no more packets are expected for this flow, the memory space occupied by the flow state in the RAM is freed up. A distributed memory multi-processor system that maintains per flow state often requires all packets in a flow to get processed by the same processor. A processor of the flow is chosen during the processing of the first packet of the flow and from there on all subsequent packets gets punted to this processor. For a system consisting of multiple processors, a flow distribution unit, as depicted in FIG. 1, is therefore required that can uniquely identify the processor of a network packet and direct this packet to the assigned processor in a flow preserving manner.
A flow is a sequence of packets being transferred between two endpoints, generally representing a single session using a known protocol. Accordingly, a flow is identified by a set of defined tuples and, for routing purpose, a flow is identified by the two tuples that identify the endpoints, i.e., the source and destination addresses. For content based services (Load balancer, firewall, Intrusion detection system etc.), flows can be discriminated at a finer granularity by using five or more tuples {Source address, destination address, IP protocol, transport layer source port, and destination port}. Each packet in a flow is expected to have the same set of tuples in the packet header.
Flows are further described in detail in the following patent applications:                U.S. Pat. No. 6,091,725, titled “Method For Traffic Management, Traffic Prioritization, Access Control, and Packet Forwarding in a Datagram Computer Network”, filed Dec. 29, 1995, in the name of inventor David R. Cheriton and Andreas V. Bechtolsheim, assigned to Cisco Technology, Inc.;        U.S. Pat. No. 6,590,894, titled “Network Flow Switching and Flow Data Export”, filed May 28, 1996, in the name of inventors Darren Kerr and Barry Bruins, and assigned to Cisco Technology, Inc.;        U.S. Pat. No. 6,308,148, titled “Network Flow Switching and Flow Data Export”, filed Dec. 20, 1996, in the name of inventors Darren Kerr and Barry Bruins, assigned to Cisco Technology, Inc.; and        U.S. Pat. No. 6,243,677, titled “Network Flow Switching and Flow Data Export”, filed Jul. 2, 1997, in the name of inventors Darren Kerr, and Barry Bruins, assigned to Cisco Technology, Inc.Each of these applications is hereby incorporated by reference for all purposes.        
To load balance multiple processors in a flow preserving manner, some existing flow distribution units maintain a flow table that is an aggregate of flows serviced by all the processors within the system. FIG. 2 is a diagram depicting this type of system. The tuple information for each flow and a processor ID of the processor handling the flow form an entry of the table. When a packet is received, the flow table is searched to match an entry of the flow table with the tuples of the received packet. If an entry is matched then the processor ID of the processor handling flow is output. Content Addressable Memories (CAMs) and Static Random Access Memories (SRAMs) are often utilized to maintain the flow tables. However, this approach does not scale well for systems with a large number of processors as with a large number of flows, the required memory size for this type of flow table becomes excessive. Also an excessive number of chips, high power dissipation, and high cost make the approach impractical.
Some other flow distribution unit uses a mechanism that performs a hashing function on some of the tuples of the packet to produce indicia. The complete hash range is divided into multiple segments and a processor is assigned to each segment. Based on the hash result, the segment is identified and the packet is punted to the processor tied to that segment. These type of systems are described in:                U.S. application Ser. No. 09/053,237 titled “Router/Service processor scalability via Flow-based distribution of traffic”, filed Apr. 1, 1998, in the name of inventor Earl Cohen, assigned to Cisco technology, Inc.;        U.S. Pat. Nos. 6,111,877, filed Dec. 31, 1997 and 6,603,765, filed Jul. 21, 2000, each titled “Load Sharing Across Flows”, in the name of inventors Bruce A. Wilford and Thomas Dejanovic, assigned to Cisco Technology, Inc.        U.S. Pat. No. 6,175,874 titled “Packet relay control method packet relay device and program memory medium”, filed Feb. 9, 1998, in the name of Imai; Yuji (Kawasaki, JP); Kishimoto; Mitsuhiro (Shinjuku, JP); Katsuyama; Tsuneo (Machida, JP), assigned to Fujitsu Limited.Each of these applications is hereby incorporated by reference for all purposes.        
A stateless hash based mechanism has a few drawbacks. Since the indicia produced by applying hash function on the tuples of the packet solely determines the processor, if the tuples are different on the two network sides of this system, packets from the either sides may get processed by different processors. Having different set of tuples for the two network sides of the system is a common scenario for systems providing NAT and NAPT service, which is described in RFC 3022, located on the “rfc-archive.org” website with the extension “/getrfc.rfc=3022”. A hash-based mechanism also fails to identify related flows of a flow as the tuple that form the hash key may be different for the related flows. A FTP control and FTP data flows are related flows, and some systems require such flows to get processed by the same processor.
Another type of flow distribution unit punts packets to processors in a daisy-chaining manner until the packets find its assigned processors. This approach is also not scalable for large number of processors.
Accordingly, improved techniques for routing flows between multiple processors that scale well as the number of processors is increased is required in the field.