The present invention relates generally to static information storage and retrieval systems, and more particularly to parallel data word search engines employing associative memories, which are also referred to as content addressable memory or tag memory.
Table lookup or database search functionality has been a common requirement which has desirably been implemented in hardware systems for many years. Though many such systems today require search functionality, this is particularly vital in high performance networks. Modem networks must have high performance search capability because they need to process high volumes of packet data, with each packet typically requiring many search operations. It is therefore imperative to improve the search capability if higher performance networks and search-based systems in general are desired.
For purposes of this discussion one can term the hardware unit used for table lookup or database search functionality a search engine. This search engine is the device or component that accepts data as input, searches that input data against a stored database, and returns a search result. Search engines are becoming very important in many applications that require high-speed lookup. For example, continuing with the network industry, switches and routers need to lookup the address field of incoming data packets in order to forward the packets to appropriate destinations. Advance network equipment also needs to lookup priority, service quality, and other fields of the packet in addition to the address field in order to service the packet with appropriate delivery quality. Data compression equipment needs to lookup data in order to find a compressed code for the data it replaces. These and other applications all demand high-speed lookup performance to keep up with ever-increasing requirements.
Content addressable memory (CAM) is increasingly used in such search engines today. It is a type of memory that accepts data as input and returns an address as its output. This is contrast to normal memory, which takes an address as an input and returns data stored at that address as an output.
FIG. 1 (background art) is a block diagram illustrating a conventional and very basic current CAM architecture. A typical CAM 1 contains three logic blocks: a CAM array block 2, a match detection block 3, and a priority encoder block 4. The CAM 1 receives a data input 5, a data sample often termed a xe2x80x9cwordxe2x80x9d even though its size is not standard and in modem usage it is often quite long. The CAM array block 2 contains CAM cells and comparison logics, and a xe2x80x9cdatabasexe2x80x9d of pre-stored content words which are potential matches with words that may be received as data inputs 5. When the CAM 1 receives a word at its data input 5 the CAM array block 2 processes this to produce sets of bit signals 6, one such bit signal 6 set for each content word compared against.
The match detection block 3 contains logics and sense amplifiers which determine from these sets of bit signals 6 if such a word being processed has any matches. The match detection block 3 produces a set of match signals 7, including one such match signal 7 for each content word (comparand) compared against.
The priority encoder block 4 contains logics to process the set of match signals 7 and to determine from it if any matches of a received word are indicated, and to pick among all such matches to establish one as having priority according to a pre-established rule. The CAM 1 then outputs the address of the highest priority match as a result output 8.
FIG. 2a (background art) is a block diagram illustrating a newer generation CAM 9, including four CAM modules 10. As was the case for the CAM 1 of FIG. 1, a data input 11 and a result output 12 are provided. Such newer generation CAMs 9 offer more flexibility for sample word xe2x80x9cdepthxe2x80x9d and xe2x80x9cwidthxe2x80x9d configuration control. Instead of one big CAM array, multiple CAM modules 10 are placed on an integrated circuit (IC) and each CAM module 10 is able to support multiple configurations. The data input 11 and result output 12 accordingly support the depth and width of this.
FIG. 2b (background art) is a block diagram illustrating exemplary manners in which the CAM modules 10 in the newer generation CAM 9 of FIG. 2a might be configured to different depth and width. For example, each CAM module 10 of FIG. 2a arranged into a 8Kxc3x9764 configuration 13, a 16Kxc3x9732 configuration 14, or a 4Kxc3x97128 configuration 15, as shown. Different configuration options like these are typically very useful, since applications tend to vary a lot and have different CAM width requirements. Unfortunately, even though the newer generation CAM 9 is more flexible than the older CAM 1, it shares the same basic architecture and can still only handle one search per clock cycle.
Currently, the best performance search functionality in network systems is implemented using CAM devices, as described above, and FIG. 3 (background art) is a block diagram illustrating this in a typical CAM-based network system 16. Here a network processor 17 (typically an application specific integrated circuit, ASIC) begins a search operation by moving data 18 to be searched to a CAM device 19, where a network information database has been prestored. A search result 20 is produced by the CAM device 19 and sent to a memory unit 21 (typically a static random access memory, SRAM), where an associate network database is stored. Finally, an associate result 22 travels back to the network processor 17 and the search operation is complete. This search cycle repeats several times for each data packet that is received, since multiple database searches are usually required per packet.
Various existing CAM devices, of course, currently exist and are in use. The MU9C4320L part by Music Semiconductors is a fixed 4 kxc3x9732-bit CAM. As such, it is a similar example of the art to that represented by FIG. 1 (background art). The SCT2000 part by SiberCore Technologies has a 2M CAM array is configurable to handle 36-bit, 72-bit, 144-bit and 288-bit entries. The LNI7010 and LNI7020 parts by Lara Networks are configurable to handle 34-bit, 68-bit, 136-bit and 272-bit entries. These are similar examples of the art to that represented in FIG. 2a (background art).
As noted above, current CAM devices, and in turn the search engines using them, have two major shortcomings. First, current search engine architecture permits very limited capability for configuring the width and depth of the CAM modules. This constraint causes bad resource utilization and increases the overall cost of systems using such CAM devices. Second, current search engine architecture can only support one search (accept one data input) per clock cycle. Since deep packet analysis, which is necessary in intelligent network systems, requires many searches per packet, it is beneficial for a search engine to support multiple searches per clock cycle. Accordingly, a different search engine architecture is needed.
Accordingly, it is an object of the present invention to provide a more powerful search engine, one which supports multiple search and lookup capabilities per clock cycle, in operating in parallel across multiple databases simultaneously.
Another object of the invention is to provide a search engine which provides better utilization, by providing finer control of the depth and width of embedded CAM resources.
And another object of the invention is to provide a search engine which supports a user configurable instruction set that allows flexibility in constructing search sequence and data input configurations.
Briefly, one preferred embodiment of the present invention is a search engine for comparing a data set against one or more databases. Multiple content addressable memory (CAM) modules are provided which are suitable for pre-storing the databases. Each CAM module has a module input suitable for accepting a datum, a module output suitable for providing a result, and a cascade bus suitable for interconnection with other of the CAM modules. A data dispatch unit is provided to receive the data set into the search engine, at a data input, and to process the data set into data and communicate the data to the module inputs of the CAM modules. A result dispatch unit receives the results from the module outputs of the CAM modules and process them into the comparison results and communicates those out of the search engine at a result output. To configure the search engine for this, an instruction unit receives a search instruction, at an instruction input, for configuring either or both of the data dispatch unit and result dispatch unit.
An advantage of the present invention is that it improves the performance of hardware systems employing it dramatically, since multiple search operations can take place in parallel.
Another advantage of the invention is that it is highly configurable and efficient, making it able to handle a wide variety of different database sizes.
Another advantage of the invention is that multiples of it may be cascaded together to permit even more powerful search and lookup operations.
Another advantage of the invention is that it may be user programmable, providing more power capabilities and simplifying the integration of it into larger circuits and with various other circuit components.
Another advantage of the invention is that it reduces the overall system pin count significantly, since inter-chip communication traffic is minimized.
And another advantage of the invention is that it takes reduces necessary ASIC or network processor complexity considerably in systems employing it, since multiple search operations may occur in the search engine rather than in the ASIC or network processor.
These and other objects and advantages of the present invention will become clear to those skilled in the art in view of the description of the best presently known mode of carrying out the invention and the industrial applicability of the preferred embodiment as described herein and as illustrated in the several figures of the drawings.