The present invention relates to associative search engines (ASEs) and, in particular, to associative search engines for performing operations on multi-dimensional entries having associated data or actions, in response to a multi-dimensional input key.
Packet classification, a well-known problem in computer science, can be defined as follows: given a data packet with certain predefined multiple components, also called packet fields, and an action taken on field-specific packet, this packet can be sorted on the basis of a different action to be taken. Classification is the action of sorting, among all the data packets, the packets that result in the same action.
The set of rules that determines the class to which the packet belongs, is stored in an information base called Classification Information Base (CIB) or a Classifier. The relevant components or fields of a specific data packet classification are assembled into a construct called a Classification Key. The Classification Key forms a CIB query, which results in a specific action to be taken on the packet. Since the Classification Key incorporates multiple fields, and since the CIB results are multiple-field dependent, the result is defined as a Multi-Field Classification.
The CIB structuring, classification methods and CIB maintenance are considered to be a very important problem, but a difficult one to resolve, and has been subject to extensive research for the last 30 years. Computational Geometry, a relatively new branch of mathematics, emerged in the late 1970s and has been extensively used to explore various alternatives (J. Goodman et al., “Discrete and Computational Geometry”, CRC Press 1997; de Berg et al., “Computational Geometry, Algorithms and Applications”, Springer-Verlag 2000).
These topics began to receive special attention once it became clear that classification is of strategic importance in data communications. The recognition of classification as a performance and intelligence bottleneck arose with the Internet's performance and functionality. This bottleneck was noticed in the early 1990s, and began to receive significant academic attention in the second half of the decade. A flood of research work has been published on the topic (P. Gupta, et al., “Algorithms for Packet Classification”, IEEE Network, March/April 2001 pp. 24-32).
A very interesting approach is taken by T. Lakshman, et al., in “High-Speed Policy-based Packet Forwarding Using Efficient Multidimensional Range Matching”, ACO Computing Communication Review 28(4) pp. 203-214, ACM SIGCOMM'98 (September 1998). This approach is different from a Trie-based approach, and utilizes the so called “bit parallelism”. The use of multi-dimensional range matches is innovative, however, the design practically limits the implementation hardware to several thousand rules and to a classification rate not higher than several million classifications per second.
Today, there are two main-stream approaches to Classifier design (see Lakshman, et al.) in a router or a switch: algorithmic and TCAM.
The Algorithmic Approach
In the algorithmic approach, the CIB consists of either:                a general-purpose microprocessor, a micro-controller, or a network processor, which executes the algorithm embedded in a low-performance/low-cost memory.                    The microprocessor fetches data from the memory and makes a decision from where to fetch memory data in the next step. This continues for many steps until the classification is completed. Since the performance requirements are not very high, a cheap but reasonable solution can be worked out (see Decasper, et al., “Router plugins: a software architecture for next-generation routers”, IEEE/ACM Trans. Networking, 8(1):2-15, February 2000), or                        a dedicated ASIC or Search Engine: This type of solution is based upon a specially built processor optimized for fast and efficient execution of the classification task.                    To this end, the search engine incorporates specialized hardware, which executes an algorithmic step in a single clock cycle. Also, the engine interfaces with the memory via a very wide data bus, in order to reduce the number of steps. This facilitates bringing in sufficient amount of data to make a more intelligent step towards a solution (see C. Semeria, “Implementing a Flexible Hardware-based Router for the New IP Infrastructure”, Juniper Networks, September 2001).                        
The algorithms typically used in both cases are Trie data structures (see Gupta, et al.). All these algorithms are multiple step algorithms, some of which execute a search in fewer steps than others. Typically, those algorithms that are extremely fast, executing a classification in a very few clocks, cause an exponential explosion in the requisite amount of storage space. This results in a very bulky, power-consuming solution. Those algorithms that are optimized for low storage requirements are very slow, and do not meet the performance criteria of high-performance routers.
In most cases, there is an additional problem due to backtracking, which results when an algorithm reaches a dead-end and the search must either start over in a completely different direction, or backtrack one or more steps and restart from there.
Due to the restricted memory bus bandwidth, these methods also end up taking a long time for CIB maintenance.
The Ternary CAM (TCAM) Approach
The TCAM approach is quite popular, currently, especially for high-performance classification duties (see M. Peng et al., “Content-Addressable memory (CAM) and its network applications”, International IC—Korea Conference proceedings, Altera International Ltd.). The TCAM advantage lies in the capability of performing one classification in every clock cycle. Consequently, TCAMs are considered to be the fastest classification solution available today. In addition, TCAMs have a deterministic classification time, as they classify always in a single clock cycle. TCAMs, however, are not problem-free:                Since TCAMs compare all entries at-once with the classification Key, they consume excessive amounts of power and tend to overheat.        Despite the ability to perform one classification per clock cycle, TCAMs are relatively slow in comparison to SRAMs (or DRAMs) developed using the same process technology and the same circuit design style. A TCAM can run classifications at about one fifth the rate of a similar SRAM. This is due to the TCAM cell complexity and density.        TCAMs are not scalable in their widths. The widest classification word that a modern TCAM supports today has a width of 576 bits. Hence, there is a limit to the number of fields and field-width that TCAM-based Classifiers can handle. The scalability limitations of TCAM-based CIBs often constrain router intelligence and performance.        TCAMs are architecturally limited in the number of classification rules they handle, because a limited number of TCAM components can be used in a single CIB. Today, this is not an issue, due to a limited size of classification rules in a Classifier. However, it may become an issue in a year or two.        TCAMs are storage inefficient. A TCAM word can be programmed to several fixed widths. For instance, if a TCAM supports 36, 72, 144, 288 and 576 key widths, then if the CIB requires 128-bit keys, 16 bits are unutilized in every key entry. This waste is typically worse for longer classification keys. Also, TCAM are inherently limited in the way that they can handle fields. Fields can use ranges, which are power-of-2 integers rather than ANY integer. This does not eliminate the possibility to express any rule, but it may become highly-storage-inefficient and therefore impractical.        In certain cases, a precisely expressed classification rule requires a flag-based logical expression. It is not viable to economically support such expressions inside the TCAM. Therefore, designers must resort to external hardware to construct such an expression and then use the output signal as a bit to drive the TCAM. This is not only expensive, but is hardwired, and therefore puts a limit on the flexibility of the solution. Ideally, one would be able to create any flag-based logical expression through reprogramming, since this would enable a great flexibility and incremental improvement by resorting to new software versions rather than to changes in hardware.                    Thus, an 18M ternary-bit TCAM, which theoretically stores over 147,000 classification rules, 128 bits per rule, statistically fits in only about 80,000 classification rules. Nonetheless, this is considered quite reasonable, when compared to the limitations of the fast algorithmic trie-based methods.                        It is important to remember that the wider the classification key, the slower the TCAM. For instance, a state-of-the-art TCAM, configured for a word-width of 144 bits, runs on a 66 MHz clock; the same TCAM, when configured for a word-width of 576 bits, operates on a 25 MHz clock.        TCAM maintenance is not autonomous. An external processor manages the TCAM address space. This occupies a great deal of the processor bandwidth as well as the write-up and the verification of the maintenance software drivers.        
There is therefore a recognized need for, and it would be highly advantageous to have, a device for, and a method of performing operations on multi-dimensional entries having associated data or actions characterized by high storage efficiency and reduced power consumption, with respect to prior art devices and methods. It would be of further advantage if the inventive device and method would provide unlimited scalability, both horizontally (in terms of the number of fields and field width) and vertically (in terms of the number of classification rules), and in addition, would have a classification clock that is independent of the classification key width.