A computer network is a geographically distributed collection of interconnected subnetworks for transporting data between nodes, such as computers. A local area network (LAN) is an example of such a subnetwork. The network's topology is defined by an arrangement of client nodes that communicate with one another, typically through one or more intermediate network nodes, such as a router or switch. As used herein, a client node is an endstation node that is configured to originate or terminate communications over the network. In contrast, an intermediate network node is a node that facilitates routing data between client nodes. Communications between nodes are typically effected by exchanging discrete packets of data according to predefined protocols. In this context, a protocol consists of a set of rules defining how the nodes interact with each other.
Each data packet typically comprises “payload” data prepended (“encapsulated”) by at least one network header formatted in accordance with a network communication protocol. The network headers include information that enables the client nodes and intermediate nodes to efficiently route the packet through the computer network. Often, a packet's network headers include at least a data-link (layer 2) header, an internetwork (layer 3) header and a transport (layer 4) header, as defined by the Open Systems Interconnection (OSI) Reference Model. The OSI Reference Model is generally described in more detail in Section 1.1 of the reference book entitled Interconnections Second Edition, by Radia Perlman, published September 1999, which is hereby incorporated by reference as though fully set forth herein.
The data-link header provides information for transmitting the packet over a particular physical link (i.e., a communication medium), such as a point-to-point link, Ethernet link, wireless link, optical link, etc. To that end, the data-link header may specify a pair of “source” and “destination” network interfaces that are connected by the physical link. A network interface contains the mechanical, electrical and signaling circuitry and logic used to couple a network node to one or more physical links. A network interface is often associated with a hardware-specific address, known as a media access control (MAC) address. Accordingly, the source and destination network interfaces in the data-link header are typically represented as source and destination MAC addresses. The data-link header may also store flow control, frame synchronization and error checking information used to manage data transmissions over the physical link.
The internetwork header provides information defining the packet's logical path (or “virtual circuit”) through the computer network. Notably, the path may span multiple physical links. The internetwork header may be formatted according to the Internet Protocol (IP), which specifies IP addresses of both a source and destination node at the end points of the logical path. Thus, the packet may “hop” from node to node along its logical path until it reaches the client node assigned to the destination IP address stored in the packet's internetwork header. After each hop, the source and destination MAC addresses in the packet's data-link header may be updated, as necessary. However, the source and destination IP addresses typically remain unchanged as the packet is transferred from link to link in the network.
The transport header provides information for ensuring that the packet is reliably transmitted from the source node to the destination node. The transport header typically includes, among other things, source and destination port numbers that respectively identify particular software applications executing in the source and destination nodes. More specifically, the packet is generated in the source node by the application assigned to the source port number. Then, the packet is forwarded to the destination node and directed to the application assigned to the destination port number. The transport header also may include error-checking information (i.e., a checksum) and other data-flow control information. For instance, in connection-oriented transport protocols such as the Transmission Control Protocol (TCP), the transport header may store sequencing information that indicates the packet's relative position in a transmitted stream of data packets.
As used herein, a data flow is a stream of data packets that is communicated from a source node to a destination node. Each packet in the flow satisfies a set of predetermined criteria, e.g., based on the packet's contents, size or relative position (i.e., temporal or spatial) in the data flow. For example, the predetermined criteria may require each packet in the flow to satisfy one or more “range checks” performed on selected values stored in the packet's network headers. Here, a range is defined as a set of numbers whose values are between predetermined upper- and lower-bound values. Thus, a range check determines whether a selected value falls within the range, i.e., between the range's upper and lower bounds, inclusive. Accordingly, in this example, the packet may be classified as part of the data flow if the values stored in selected fields of the packet's headers satisfy the data flow's associated range checks.
An intermediate network node may be configured to perform “flow-based” routing operations so as to route each packet in a data flow in the same manner. The intermediate node typically receives data packets in the flow and forwards the packets in accordance with predetermined routing information that is distributed using a protocol, such as the Open Shortest Path First (OSPF) protocol. Because each packet in the flow is addressed to the same destination node, the intermediate node need only perform one forwarding decision for the entire data flow, e.g., based on the first packet received in the flow. Thereafter, the intermediate node forwards packets in the data flow based on the flow's previously determined routing information (i.e., adjacency information). In this way, the intermediate node consumes fewer resources, such as processor bandwidth and processing time, than it would if it performed a separate forwarding determination for every packet it receives in the data flow.
In practice, the intermediate network node may implement a hash table which stores a plurality of ranges and other packet-related information used to classify received packets into their corresponding data flows. The hash table is typically organized as a table of linked lists, where each list may be indexed by the result of applying a conventional hash function to “signature” information. In this context, a signature is a set of values that remain constant for every packet in a data flow. For example, assume each packet in a first data flow stores the same pair of source and destination IP address values. In this case, a signature for the first data flow may be generated based on the values of these source and destination IP addresses. Likewise, a different signature may be generated for a second data flow whose packets store a different set of source and destination IP addresses than packets in the first data flow. Of course, those skilled in the art will appreciate that a data flow's signature information is not limited to IP addresses and may include other information, such as TCP port numbers, IP version numbers and so forth.
Each linked list in the hash table contains one or more entries, and each linked-list entry stores information corresponding to a particular data flow. Such information may include, inter alia, the data flow's associated signature information, one or more ranges for performing the data flow's range checks and a data-flow identifier (“flow ID”). Typically, each range stored in the linked-list entry is represented by a pair of upper- and lower-bound values. The flow ID identifies the particular data flow and also may be used to locate routing information associated with the data flow. To that end, the intermediate network node may maintain a data structure that maps flow ID values to the memory locations of their corresponding routing information, e.g., stored in the node's “in-core” memory. Alternatively, the flow ID values may directly incorporate the memory locations of their data flows' routing information.
When a packet is received by the intermediate network node, signature information is extracted from the packet's network headers and hashed using a conventional hash function, such as a cyclic redundancy check (CRC) function. The resultant hash value is used to index a hash-table entry which, in turn, references a linked list. Entries in the linked list are accessed sequentially until a “matching” entry is found storing both the extracted signature and a set of ranges including one or more software-specified “target” values. Like the packet's signature, the target values may be extracted from selected fields in the packet's headers and therefore may include IP addresses, TCP port numbers, etc. In some cases, the extracted target values may overlap information in the packet's signature, although the target values more generally include any values for which range checks can be performed. Thus, even if a linked-list entry contains the packet's signature, the entry is not considered a “match” unless it also contains a set of ranges including each of the extracted target values. When a matching linked-list entry is located, the entry'stored flow ID is used to associate the received packet with a data flow and the packet is routed in accordance with that flow.
Conventional flow-based routing, as described above, suffers the disadvantage that the intermediate network node may have to search a large number of linked-list entries before locating a matching entry for the received data packet. For instance, suppose a relatively large number of different data flows are associated with the packet's signature and the flows differ based on their associated sets of target-value ranges. In this case, the packet's hashed signature value indexes a linked list containing multiple entries that store the packet's signature—namely, one entry for each set of target-value ranges associated with the signature. For a large number of different ranges associated with the signature, the linked list may contain an exorbitant number of entries that may have to be sequentially searched before a matching entry can be found. Consequently, the process of searching the linked list and performing a comparison of each list entry's associated target-value range checks may consume an unreasonable amount of time and resources due to the large number of list entries that may have to be traversed.
This problem is compounded when two or more different signatures “collide” in the hash table. A plurality of signatures are said to collide when their hash values generate the same hash-table index. Therefore, linked-list entries associated with the colliding signatures are essentially merged into a single linked list. As such, list lengths in the hash table grow longer as the number of signature collisions increases. The combined effect of having multiple signatures collide and having some (or all) of those signatures associated with a plurality of different target-value ranges results in linked lists that become so lengthy that they require prohibitive amounts of time and resource consumption to search during conventional flow-based routing.
There is therefore a need for a faster, more efficient technique for locating one or more desired ranges in a hash table, without having to traverse as many linked-list entries as conventionally required. The technique should reduce the amount of time and resources, such as processor bandwidth and processing time, that an intermediate network node consumes when performing flow-based routing.