Content addressable memories (CAMs) are widely used in applications where extremely fast search on a database is required, for instance, in networking, imaging, voice recognition, etc. For example, in network engines, CAMs are used to perform a fast search in the database, corresponding to the header field of any packet, and forward the packet to the corresponding matched address.
There are two kinds of CAMs which are primarily used: a binary CAM (BCAM) and a ternary CAM (TCAM). Further, the TCAM referred to herein is referred to as an XY TCAM.
Since a very fast search is required, search performance is a critical performance parameter for CAMs. Also, the basic mechanism of search is very power intensive, owing to a parallel nature of operation. Hence, it is extremely important for an XY TCAM (Ternary CAM) design to have the best possible search performance along with having the least dynamic power expenditure for said search.
Reference is now made to FIG. 1 which is a schematic diagram of a standard XY ternary content addressable memory (TCAM) cell 10. The TCAM cell 10 is composed of two conventional six transistor (6T) static random access memory (SRAM) cells 12 and 14.
A first SRAM cell 12 forms a data cell of the TCAM cell 10 and includes two cross-coupled CMOS inverters 16 and 18, each inverter including a series connected p-channel and n-channel transistor pair. The inputs and outputs of the inverters 16 and 18 are coupled to form a latch circuit having a data true (DT) node 20 and a data complement (DC) node 22. The first SRAM cell 12 further includes two transfer (pass gate) transistors 24 and 26 whose gate terminals are coupled with a data word line node and are controlled by the signal present on a data word line (DWL). Transistor 24 is source-drain connected between the true node 20 and a node associated with a true bit line (BLT). Transistor 26 is source-drain connected between the complement node 22 and a node associated with a complement bit line (BLC).
A second SRAM cell 14 forms an enable cell of the TCAM cell 10 and includes two cross-coupled CMOS inverters 36 and 38, each inverter including a series connected p-channel and n-channel transistor pair. The inputs and outputs of the inverters 36 and 38 are coupled to form a latch circuit having an enable true (ET) node 40 and an enable complement (EC) node 42. The second SRAM cell 14 further includes two transfer (pass gate) transistors 44 and 46 whose gate terminals are coupled with an enable word line node and are controlled by the signal present on an enable word line (EWL). Transistor 44 is source-drain connected between the true node 40 and a node associated with the true bit line (BLT). Transistor 46 is source-drain connected between the complement node 42 and a node associated with the complement bit line (BLC).
The TCAM cell 10 is further composed of a comparator circuit 60 operable to compare the input search data bit with contents of data cell 12 and enable cell 14 and drive a match line (MATCH) accordingly. The comparator circuit 60 comprises a first transistor 80 and second transistor 82 having source-drain paths coupled in series between a node for the match line (MATCH) and a node for a reference voltage (for example, ground). The first transistor 80 has a gate terminal coupled to the true node 20 (DT) of the first SRAM (data) cell 12. The second transistor 82 has a gate terminal coupled to a true search node associated with a true search line (SLT). The comparator circuit 60 still further comprises a third transistor 90 and fourth transistor 92 having source-drain paths coupled in series between the node for the match line (MATCH) and the node for the reference voltage (for example, ground). The third transistor 90 has a gate terminal coupled to the true node 40 (ET) of the second SRAM (enable) cell 14. The second transistor 92 has a gate terminal coupled to a complement search node associated with a complement search line (SLC).
A TCAM with w words and b bits using the TCAM cells 10 of FIG. 1 is typically organized in an array (or matrix) format having w rows and b columns as shown in FIG. 2. Because there are two SRAM cells 12 and 14 per TCAM cell 10, the CAM array will include 2*w rows of SRAM cells. The memory has log(2)(w) address bits (A_bus), “b” data inputs (D), “b” data outputs (Q), “b” search inputs (SD) and “w” match outputs (MO). The memory further has an input pin “SEL” to control whether a read/write operation is to be performed on data cell or the enable cell, an input pin Chip Select (CSN) to control initiation of a valid cycle, an input pin Write Enable (WEN), to control if the initiated cycle is a Read or Write Cycle and another input pin Search Enable (Search) to control if the initiated cycle is a Search Cycle. The bit line ports “BLT” and “BLC” are shared between the data and enable cells 12 and 14 of each TCAM cell 10, and are also shared in a column with corresponding “BLT” and “BLC” ports respectively of all other TCAM cells in the same column (as defined by bit lines extending across the array along the columns). The data word line port “DWL” of the data cell is connected to corresponding “DWL” ports of all other data cells 12 in the same row (as defined by a data word line extending across the columns of the array along each row). Similarly, the enable word line port “EWL” is shared with the “EWL” ports of all other enable cells 14 in the same row (as defined by a data word line extending across the columns of the array along each row). The search data line ports “SLT” and “SLC” of each TCAM cell in a column are connected with “SLT” and “SLC” ports respectively of all other TCAM cells in the same column (as defined by search data lines extending across the rows of the array along each column). The port “MATCH” of a TCAM cell in any row is connected to corresponding “MATCH” ports of all other TCAM cells in the same row (as defined by a match line extending across the columns of the array along each row).
Operation of the TCAM array will now be described.
In the idle mode, the “DWL” and “EWL” lines are driven to logic low, and the “BLT/BLC” lines are precharged to logic high. Furthermore, the “SLT/SLC” lines are driven to logic low, and the “MATCH” line is precharged to logic high.
In the read operation, controlled by the combination of Chip Select (CSN)=0, Search Enable (SEARCH)=0 and Write Enable (WEN)=1, the “DWL” or “EWL” lines corresponding to the addressed row (A_bus), and depending on the requirement to read from the data cell or the enable cell (based on the logic state of “SEL”), are asserted to logic high. Precharge of the “BLT/BLC” lines of all columns is turned off resulting in discharge of one of “BLT” or “BLC” lines depending on whether “DT” or “DC” in case of read on data cell or “ET” or “EC” in case of read on the enable cell is logic low, corresponding to each column, in all columns. This discharge of either “BLT” or “BLC” line in each column is sensed by a corresponding sense amplifier and is transferred to corresponding output (Q) as a logic 0 or logic 1 value. The “DWL” or “EWL” lines are then driven back to logic low and the “BLT/BLB” lines are precharged back to logic high.
In the write operation, controlled by the combination of Chip Select (CSN)=0, Search Enable (SEARCH)=0 and Write Enable (WEN)=0, the “DWL” or “EWL” lines corresponding to an addressed row (A_bus), and depending on the requirement to write to the data cell or the enable cell (based on the logic state of “SEL”), are asserted to logic high. Precharge of the “BLT/BLC” lines of all columns is turned off and either “BLT” or “BLC” line for any column is driven to logic low (with the other maintained at logic high), depending respectively on a corresponding column's data to be written (D) state being either a logic 0 or logic 1 value. This results in writing a logic low on “DT” or “ET” and storing a logic high on “DC” or “EC” of the selected row of any column in case where a logic 0 value is written on that column, and vice versa. The “DWL” or “EWL” line is then driven back to logic low and the “BLT/BLC” lines are precharged back to logic high.
In the search operation, controlled by the combination of Chip Select (CSN)=0, Search Enable (SEARCH)=1, precharge of the “MATCH” lines of all the rows is turned off. The “SLT/SLC” lines of all the columns is driven by logic 0/1 values or logic I/O values, depending on the corresponding search data bit (SD); i.e., SLT=1 and SLC=0 if SD=1 and SLT=0 and SLC=1 if SD=0. The comparator of any column of any row affects the corresponding row's “MATCH” line as follows:
SLT-SLC0 01 1DT-ET(Always hit)0 1(Always miss)1 00 110011 01100
As can be seen from above truth table, the comparator of any bit of any row will NOT discharge the “MATCH” line of that row, if i) both the data cell 12 and enable cell 14 of that bit store a 0 OR ii) the stored bit in the data true node 20 (DT) in corresponding locations of data cells 12 does not match data driven on SLT AND correspondingly, the stored bit in the enable true node 40 (ET) in the enable cell 14 does not match data driven on SLC. This condition when the XY TCAM Cell of a particular location does not discharge a corresponding “MATCH” line may be called a “HIT on that bit”. Similarly, the comparator of any bit of any row will discharge the “MATCH” line of that row, if i) both the data cell 12 and enable cell 14 of that bit store a 1 OR ii) either the stored bit in the data true node 20 (DT) in corresponding locations of data cells as well as driven data on SLT are 1 OR the stored bit in enable true node 40 (ET) in corresponding locations of enable cells as well as driven data on SLC are 1. This condition when the XY TCAM Cell of a particular location discharges the corresponding “MATCH” line may be called a “MISS on that bit”. Thus, the “MATCH” line corresponding to any row will NOT discharge if there is a “HIT” on all the bits of that row, which may be called a “HIT on that row”. Similarly, the “MATCH” line corresponding to any row will discharge if there is a “MISS” on one or more (at least one) bits of that row, which may be called a “MISS on that row”. The “discharge” or “non-discharge” of all the “MATCH” lines is sensed by a next stage of sense amplifier, and is transferred as match output (MO) of the corresponding row. Thus, there are “w” match outputs (MO), of which, only few will go or remain high, corresponding to address locations which have a “HIT”, and other address locations will either go low or remain low, corresponding to address locations which have a “MISS” (i.e., mismatch of at least one bit). The search lines “SLT” and “SLB” are then driven back to logic low, and “MATCH” lines are precharged back to logic high, making the TCAM ready to accept the next cycle.
Because there may be multiple “MISS” (mostly more MISS than HIT expected in a typical database search scenario) as result of a search operation, the corresponding discharge of match lines, and further precharge at the end of the cycle, constitutes a huge amount of dynamic power as well as peak power. Only the match lines of address locations which have a “HIT” do not discharge. It may also be observed that the slowest discharge of the “MATCH” line is for the case when there is a single bit “MISS” for any address location, and “MATCH” line sense enable signal has to be designed to be able to correctly sense the least amount of discharge on “MATCH” line corresponding to such a case of single bit MISS. However, in many other words (i.e., address locations), there is a possibility of multiple bit MISS, which would lead to a discharge of the MATCH lines of those words by a higher amount. This is the case in many instances in typical usage, leading to higher discharge and hence undesirable increase in dynamic power.
As previously described, an XY TCAM cell 10 is composed of two six transistor SRAM cells 12 and 14, and comparator logic 60, which are connected as described earlier. In any foundry for a given technology node, one or more six transistor SRAM cells and their corresponding layouts are available, which are optimized for either density, performance (read current) or lower voltage of operation. Layouts of these SRAM cells are optimized from various aspects, with even certain design rule derogations corresponding to that technology node, in order to achieve highest possible density for that particular cell. Considerable cost (time and effort) is incurred in tuning the process steps, in order to achieve a good yield for those memory cells. FIG. 3 illustrates a commonly used horizontal layout topology for a six transistor SRAM cell. The layout of FIG. 3 is referred to as a “horizontal” layout because the perimeter outline of SRAM has the shape of a rectangle and the cell layout is oriented with its longer side in the horizontal direction (corresponding to the row direction of the memory array).
The six transistor SRAM cell topology of FIG. 3 is used in an SRAM memory array organized in rows and columns with cells in the same row sharing the word lines and those in same columns sharing the bit lines. Since the cell is horizontally oriented (i.e., the height of the cell is much less than the width of the cell), for most of the practical range of number of words and bits the memory array is able to have much shorter bit lines than word lines, and hence a lesser capacitance on bit lines. This makes the FIG. 3 cell topology preferred one for an SRAM since bit line capacitance is one primary parameter governing SRAM read performance.
The cell topology for this “horizontal” memory cell of FIG. 3 has remained more or less the same across the past many process technologies, and is likely to remain the same in future technologies (i.e., the individual cells in SRAM memories will continue to have a “horizontal” aspect ratio), owing to the fact that it is able to minimize the capacitance on the bit lines.
Also the aspect ratio of the SRAM cell (i.e., “x dimension/y dimension”) is progressively becoming more and more skewed towards the x dimension with technology evolution.
As earlier mentioned, tuning the process for any given memory cell (for example, to guarantee good yield) incurs a high cost. Hence, in general, the two SRAM cells required for a TCAM cell in any technology are simply taken as one of the SRAM cells on offer in that process technology, and additional circuitry is provided for implementing the comparator portion. FIG. 4 illustrates the topology for a conventional TCAM cell which is based on the use of two SRAM cell topologies (FIG. 3) plus device layout supporting the comparator portion. FIGS. 5A, 5B and 5C show the layout for metal 2, metal 3 and metal 4 for the conventional TCAM cell topology of FIG. 4. Dotted outlines of the layouts of the SRAM cells 12 and 14 and comparator 60 circuitry for the TCAM cell 10 are provided in FIGS. 5A, 5B and 5C.
The FIG. 4 TCAM cell is organized, as previously explained, with cells in adjacent rows sharing the “BLT/BLC” lines, and those in adjacent columns sharing the “DWL”, “EWL” and “MATCH” lines. Even with two SRAM cells that correspond to data and enable cells being provided one over another in adjacent rows, the aspect ratio of the cell is still quite skewed in favor of the x dimension (i.e., x dimension>y dimension). Also, in order to make the comparator faster, if the comparator devices are made bigger, then for the FIG. 4 topology, there is a further increase in the horizontal extent of the cell, resulting in an even higher skew in aspect ratio. Since the “MATCH” line travels horizontally (i.e., between adjacent cells 10 in the array in the horizontal direction across columns along the rows and parallel to the longer side edge of the rectangular-shaped layout), the higher horizontal extent results in increased routing capacitance of the “MATCH” line, which results in increased dynamic power and slower search operation speed. To emphasize this point, reference is made to FIGS. 2 and 5B which clearly illustrate the bit lines travelling vertically between adjacent cells 10 in the array in the vertical (column) direction across the rows along columns parallel to the shorter side edge of the rectangular-shaped SRAM cell layout, while FIGS. 2 and 5C clearly illustrate the MATCH line travelling horizontally between adjacent cells 10 in the array in the horizontal (row) direction across columns along rows parallel to the longer side edge of the rectangular-shaped SRAM cell layout.
There is a need in the art for a better TCAM cell and array topology.
It is possible to present a new topological organization of the devices for the TCAM cell, including those of the six transistor SRAM cell, in order to achieve a lower “MATCH” line capacitance. The technical literature discloses a number of possibilities, but these possibilities exhibit known limitations including: high cost incurred in order to fully validate all aspects of the new memory cell, including the SRAM cell portion; and loss of the density advantage associated with the design rules derogations in existing SRAM cell topologies which are managed in manufacturing through rigorous and iterative process with a new topology.
Another method to improve the performance of the search operation is to increase the strength (i.e., width) of the transistors forming the comparator portion, so as to enable faster “MATCH” line discharge for the case of “MISS”. The limitation associated with this solution is that capacitance of “MATCH” line also increases because of larger devices on “MATCH” line and also in the prior art layout topology, increase in width of comparator devices implies increasing the width of the TCAM cell, which means further increase in routing capacitance of the “MATCH” line. This in turn implies limited gain in both dynamic power as well as search speed performance, as the increase in strength of the comparator devices is partially offset by an increase in capacitance of the “MATCH” line due to the use of larger devices.
Thus, there would be an advantage if an operationally improved TCAM design could be provided which advantageously utilized the well known horizontal SRAM topology while also favoring a capacitance reduction of the “MATCH” line.