1. Field of the Invention
This invention relates to packet routing and, more particularly, to efficient use of the resources available in a multi-threaded processor environment to search multiple fields of a packet for a matching routing rule.
2. Description of the Related Art
Many computing problems require searching multiple fields of a data packet for a match to one or more of a large set of rules. In packet routing, for example, multiple fields in a packet header may be searched to identify a matching routing rule. A commonly used set of routing rules is an access control list (ACL). An ACL may include a large number of rules, each of which specifies values and ranges of values for one or more fields that determine whether or not a particular action should be allowed. During routing of IP packets, the fields most commonly used in ACL determinations are the IP source address (SA), IP destination address (DA), source port, destination port, protocol ID, and differentiated services control point (DSCP). The actions that may be allowed or disallowed by various ACL rules include dropping a packet, forwarding a packet, routing a packet to the specified destination address, routing the packet through the specified destination port, establishing a priority for routing a packet, and many others.
A number of hardware-oriented solutions to the above problem have been implemented, such as using ternary content addressable memory (TCAM) for rule storage and retrieval. However, these solutions have several disadvantages. TCAM hardware uses more transistors per bit than SRAM. Extra chips are required compared to a software-oriented solution. In addition to requiring more hardware, TCAM hardware presents a problem with power scaling because all comparisons are performed in parallel. Also, TCAM operations may be slow, such as when accessed through program I/O of a network processor. Finally, a brute-force use of TCAM may require very large storage capacity to handle arbitrary ranges of values for individual fields.
In modern computer systems, compute power and memory capacity may be abundant. For example, modern computer systems often utilize multiple processors executing in parallel to increase overall operating efficiency. A variety of configurations are possible including separate microprocessors, a single microprocessor that includes multiple cores, or a combination of the two. A typical multi-threaded microprocessor may support up to 8 GB of memory using 1 GB DIMMs. The availability of these compute resources suggests that an algorithmic solution to the ACL search problem may be desired.
In general, there are two classes of search algorithms to be considered: multidimensional search and divide and conquer search. Multidimensional searches consider the entire space of all of the relevant fields together. Divide and conquer searches perform independent searches on the spaces of each of the fields, combining the results to locate the desired ACL rule.
In multidimensional searches, all of the packet header fields may be searched to find the least cost ACL rule that matches the fields. Searches may be performed over a tree-like data structure that stores the ACL rules such as a grid-of-tries, extended grid-of tries (EGT), extended grid-of tries with path compression (EGT-PC), hierarchical intelligent cuttings (HiCut), or some other trie variant. The more dimensions or fields that are combined in the algorithm, the deeper the trie, the higher the memory latency, and therefore, the lower the performance of each thread. In divide and conquer searches, various techniques may be used to combine the independent search results. A tradeoff may be necessary between computational efficiency and the size of storage needed to support large, pre-computed data structures used in combining search results. Therefore, what is needed is a search algorithm that efficiently uses storage and processing resources.