The present invention relates to the field of content addressable memories (CAMs). More specifically, in one embodiment the invention provides an improved means for sensing the contents of a CAM cell.
CAMs are commonly used in cache memories and translation look-aside buffers (TLBs). A TLB is used to translate virtual addresses into physical addresses, usually in the context of a virtual memory management unit. The translation is accomplished in a TLB by coupling the outputs of a CAM with a random-access memory (RAM) storing physical addresses. In a translation operation, a virtual address is input to the CAM, and one, if any, of CAM locations indicates that its contents, a virtual address, matches the input virtual address input to the CAM. A match indication signal indicating which CAM location matched is used to address RAM location, where a CAM location and a corresponding RAM location define a virtual-physical address pair. The RAM then outputs the physical address in the addressed RAM location, thus completing the translation cycle.
A cache memory operates in a similar manner, except that the data stored in the CAM and the RAM are not necessarily address values. Although the following discussion refers to TLBs, similar extensions of the concepts discussed are applicable to caches as well.
A CAM gets its name from the fact that a value in the CAM is not located by its location, but by the contents of the CAM (although dual-mode addressing, by content and by location, is a feature of some CAMs). Thus, a CAM is addressed by applying a value to an input of the CAM, and the CAM responds by indicating which, if any, memory location contains the value input.
Because a TLB is used in address translation it is a critical path in a memory access, and therefore any delay or speed improvement in performing the translation will show up in the memory access time. The cause for delay in a TLB will become apparent from the following discussion of how a TLB circuit is laid out.
FIG. 1 is a block diagram of a computer system 10, including a CPU 12 which has an output coupled to an input of a TLB 14. TLB 14 in turn has an output coupled to an address input of a memory 16, which outputs data on a data bus 18 back to CPU 12.
In a memory access operation, CPU 12 outputs a virtual address which CPU 12 wishes to access. The virtual address is input to TLB 14, and a short time later, TLB 14 outputs a physical address to memory 16. While memory 16 is shown as a single block in FIG. 1, memory 16 may be a complex memory system including caches and other memory devices. For the present discussion, it is only important that memory 16 outputs data words based on physical addresses input to memory 16. As FIG. 1 shows, the delay between the output of a virtual address from CPU 12 and the input of data by CPU 12 from data bus 18 is dependent on the delay between the input of a virtual address to TLB 14 and the output of a physical address from TLB 14.
FIG. 2 is a block diagram showing TLB 14 in greater detail. In TLB 14, the virtual address input is coupled to a CAM 24. Cam 24 has L rows of CAM cells, with M cells per row, with one match line output for each row. The L match lines are inputs to an RAM data array 26, which has L rows, or words, or N bits each. RAM 26 has an N-bit output bus for outputting a physical address.
Thus, when an M-bit virtual address is input to CAM 24, it is matched against L virtual addresses, although all rows need not contain valid virtual addresses, and the match line output is asserted for the row containing a valid virtual address. The values for the integers L, M, and N need not be any particular value, but one example is a TLB with 64 rows, or entries, with 20-bit virtual addresses and 20-bit physical addresses (L=64, M=20, N=20). While M and N are equal in this example, it is not a necessary condition of a TLB.
Mechanisms for ensuring either that only one match is possible or that only one match is acknowledged, and that invalid entries are ignored are well known in the art of TLB construction. Mechanisms for handling exceptions where no match is found are also well known in the art. Therefore, the following discussion will assume without loss of generality, that one and only one match line output is asserted for an input virtual address.
FIG. 3 is a block diagram showing CAM 24 in greater detail. Within CAM 24, each input bit is coupled to a bit line controller 30, which has two outputs labelled BL and L.sub.-- BL. CAM 24 is shown as a two-dimensional array of CAM cells 39, although for clarity, not all M.times.L cells are shown. Each CAM cell 39 is shown in a particular row and column, and is shown coupled to the bit lines BL and L.sub.-- BL that are associated with the CAM cell's column and coupled to a match line 36 that is associated with the CAM cell's row. A match line 36 is coupled to a recharging means 38 and a match output of each CAM cell 39 in the row with which the match line 36 is associated. A CAM cell 39 is shown in FIG. 3 as an RAM cell 32 and a comparator cell 34. RAM cells 32 have a line coupled to each bit line, BL and L.sub.-- BL, associated with its CAM cell 39, and two comparator lines coupled to comparator cell 32. Comparator cell 32 in turn has a line coupled to each of the same bit lines, and one line coupled to the match line 36 associated with CAM cell 39 row.
FIG. 4 is a block diagram showing an example of a CAM cell 39, which comprises RAM cell 32 and comparator cell 34, in greater detail, along with the match line 36 and bit lines, BL and L.sub.-- BL, associated with CAM cell 39. Also shown in greater detail is an example of recharging means 38.
RAM cell 32 comprises two inverters 40, 42 and two transmission transistors 50, 52. Transistor 50, when turned on, couples a node L.sub.-- S to bit line L.sub.-- BL, while transistor 52, when turned on, couples a node S to bit line BL. An output of inverter 40 and an input of inverter 42 are coupled to node S, while an input of inverter 40 and an output of inverter 42 are coupled to node L.sub.-- S. The gates of transistors 50, 52 are coupled to a CAM write line 51, and nodes S and L.sub.-- S are coupled to comparator inputs C and L.sub.-- C, respectively.
Comparator inputs C and L.sub.-- C are coupled to the inputs of transistors 46, 44, respectively. Transistor 44 is coupled at a drain terminal to bit line BL and at a source terminal to a drain terminal of transistor 46, while a source terminal of transistor 46 iS coupled to bit line L.sub.-- BL. While drain and source terminals are indicated on many transistors, these terminals may, at some point in their operation, change roles, since the transistors discussed herein are field-effect transistors. A match transistor 48 is coupled at a gate to the node between transistors 44, 46, at a drain to match line 36, and at a source to ground.
Unless otherwise indicated, the transistors discussed herein, such as transistors 50, 52, 44, 46, and 48, are NMOS FET (N-channel metal-oxide-semiconductor field-effect transistor) devices. In FIG. 4, recharging means 38 comprises a PMOS (P-channel metal-oxide-semiconductor) FET 53 which has a source terminal coupled to V.sub.cc, a drain terminal coupled to match line 36, and a gate terminal coupled to ground (V.sub.ss).
Although FIG. 4 only shows one CAM cell 39, bit lines BL and L.sub.-- BL extend beyond the FIG., indicating that they connect in a similar manner to the other L-1 CAM cells associated with the same bit with which CAM cell 39 is associated. In a similar manner, CAM write line 51 and match line 36 extend beyond the FIG., indicating that they connect in a similar manner to the other M-1 CAM cells which are in the same CAM cell row as CAM cell 39.
Two functions of a CAM cell, writing a value into CAM cell 39 and comparing a value to the value written into CAM cell 39 will now be described with reference to FIGS. 3 and 4.
The value written into CAM cell 39 is either a 1 or a 0. Throughout this disclosure, logical 1 refers to and is interchangeable with a logical high, and a voltage V.sub.cc, while logical 0 refers to and is interchangeable with a logical low, and a voltage V.sub.ss. In a write operation, bit line BL is driven to a high or low voltage equal to the value to be written, while bit line L.sub.-- BL is driven to its complement, and CAM write line 51 is driven high from its quiescent low voltage. When CAM write line 51 is driven high, transistors 50, 52 connect BL to node S and L.sub.-- BL to node L.sub.-- S. Because of the inverter loop formed by inverters 40, 42, the voltages on nodes S, L.sub.-- S remain after CAM write line goes low. Thus, so long as CAM write line 512 remains low, the content of RAM cell 32 does not change.
For a compare operation, the input value to be compared against RAM cell 32 contents is applied to bit line BL, and its complement is applied to bit line L.sub.-- BL. If the value on bit line BL matches the value on node S, the values on bit line L.sub.-- BL and node L.sub.-- S will also match, and lines C and L.sub.-- C will match BL and L.sub.-- BL respectively. For example, if node S is held high and BL is driven high, line C will be held high, and node L.sub.-- S, node L C, and bit line L.sub.-- BL will all be low. In such a case, transistor 46 will be on and transistor 44 will be off, causing the node between the two transistors 44, 46 to be driven low (since it is coupled to L.sub.-- BL, which is low), and because that node is coupled to the gate of match transistor 48, match transistor 48 is off. The reverse is also true. Of course, if the bit lines are held low, match transistor 48 is off regardless of the values on lines C, L.sub.-- C.
Thus, match transistor 48 turns on when the bit lines are driven with complementary values and the value at node S mismatches the value on BL, but otherwise match transistor 48 remains off. Before a compare operation begins, recharging means 38 keeps match line 36 charged up to V.sub.cc. If and when match transistor 48 turns on, a current path is created from match line 36 to ground, and this path discharges match line 36 towards ground (even though recharging means 38 continues to supply current to match line 36). FIG. 3 shows all comparator cells 34 in a row are connected to that row's match line. Therefore, if any one of the bits input to CAM 24 mismatches against the cells of a row, that row's match line will start to discharge. The number of mismatched cells determines the rate at which the match line will discharge, and if a complete match of each bit in a row occurs, the match line for that row will not discharge. Once the matching row, if any, is detected (i.e., all but one match line has a detectable discharge), the match detect operation is terminated and recharge means 38 begins to recharge match line 36. To allow match line 36 to recharge, the mismatch-indicating match transistors 48 are turned off by driving the bit lines BL and L.sub.-- BL both low, can be done by bit line controller 30 in response to a disable signal, which in some embodiments is generated by a match detecting circuit (not shown).
FIG. 5 is a schematic diagram showing inverters 40 and 42 in greater detail. In FIG. 5, inverter 40 is formed by a PMOS device 54 and an NMOS device 56. The gates of PMOS device 54 and NMOS device 56 are tied together and form the input to inverter 40, while the source of PMOS device 54 is tied to V.sub.cc, the drain of PMOS device 54 is tied to the drain of NMOS device 56 which together form the output of inverter 40, and the source of NMOS device 56 is tied to ground. Similarly, inverter 42 is formed by a PMOS device 58 and an NMOS device 60. The gates of PMOS device 58 and NMOS device 60 are tied together and form the input to inverter 42, while the source of PMOS device 58 is tied to V.sub.cc, the drain of PMOS device 58 is tied to the drain of NMOS device 60 which together form the output of inverter 42, and the source of NMOS device 60 is tied to ground.
If CAM 24 is used for a series of translations in quick succession, as is usually the case, each access cycle must be long enough to accommodate the time required for the setup and stabilization of the bit lines, the time required to discharge the match lines enough to be detected, and the time required to recover from a mismatch and recharge the match line. When these times are added together, the result is an undesirable delay. With faster and faster computers requiring faster memory access times, delay is undesirable and lowers the performance of the computer. A typical sequential CAM access cycle has two periods, a match sensing period and a precharge period. The match sensing period begins with values being placed on the bit lines, and can end anytime after a match is detected. However, since the period times are generally fixed, the fixed match sensing period time must be the detect time for the slowest case, which is where only one bit is mismatched in a CAM cell row. The precharge period can begin after a match sensing period. The time for this period is dictated by how fast the bit lines can be driven low (to stop match transistors 48 from discharging match line 36), and recharging means 38 can recharge match line 36.
Several solutions have been proposed to the problems caused by long CAM access cycles. One solution is to increase the speed of the CAM by using faster technology, such as moving from .lambda.=1 micron technology to .lambda.=0.35 micron technology (.lambda. is a measure of semiconductor feature spacing). For example, one of the causes of delay in a semiconductor circuit is gate capacitance, which can be reduced by moving to lower feature spacing. However, advanced semiconductor technologies do not come without a cost, such as increased heat generation, lower yields and increasingly complex fabrication machinery. Also, with smaller feature sizes, some devices, such as current sources, are less effective if made smaller and begin to take up relatively more chip space as the feature size goes down. Even if smaller feature sizes are used, the need exists for even faster cache response times, since the circuits which need to access the CAM are proportionally faster when made with smaller feature sizes.
Because the time delays discussed above are serial, each adds to the overall delay. Thus, the need exists to reduce the bit line setup time, the match detection delay, and the match line recharge time.
one solution to reduce the match detection delay is illustrated by FIG. 6. FIG. 6 shows one of the match lines 36 input into a match sense amplifying inverter 80. Inverter 80 comprises a PMOS transistor 82 coupled at its source to V.sub.cc, coupled at its drain to an output line 84 of inverter 80, and at its gate coupled to match sense line 36. Inverter 80 also comprises an NMOS transistor 86 coupled at its drain to output line 84, with a grounded source, and a gate also coupled to match sense line 36. The width of transistor 82 is indicated as nW, which is n times the width of transistor 86. FIG. 6 also shows a graph of how the voltage on match line 36 affects output line 84, showing the discharge of the match line for one, two, and three mismatched CAM cells.
When match line 36 goes high, transistor 82 turns off and transistor 86 turns on, and the output is coupled to ground, while when match line 36 goes low, transistor 82 turns on and transistor 86 turns off, and the output is coupled to V.sub.cc, thus forming an inverter. As match line 36 discharges, it eventually discharges far enough that output line 84 goes high, indicating a mismatch.
If inverter 80 is symmetrical (n=1), output line 84 would not go low until the match line discharges past V.sub.cc /2. However, if n is greater than one, the threshold voltage at which output line 84 changes is EQU V.sub.T .apprxeq.(1-g/(n+g))*V.sub.cc,
where g is the transconductance (gm) ratio of NMOS to PMOS. The equality is not exact, mainly due to the difference between V.sub.tn and V.sub.tp, however such effects are not germane to the present discussion.
As indicated in FIG. 6, the time, t, required to detect a match is lower for a higher value of n. Shortening the match detect time by raising the value of n is not without cost however. If n is increased, transistor 82 will occupy more chip space, which is always at a premium in semiconductor designs. Furthermore, as the threshold voltage, V.sub.T, gets closer to V.sub.cc, the more effect noise and variations in V.sub.cc will have on the triggering of inverter 80. The rate at which match line 36 discharges is proportional to the number of mismatch-indicating match transistors 48 which turn on. While inverter 80 may sense the discharging of match line 36 sooner with more than one mismatch, a CAM must be designed with a match detect time sufficient to detect the slowest case, that of a single bit mismatch.
FIG. 7 illustrates one alternative to increasing the size of PMOS transistors in inverter 8. FIG. 7 shows an inverting match sense amp 88 which has match line 36 as a negative input, a voltage threshold line as a positive input, and output line 84 as its output. However, such a circuit suffers similar problems. While an internal PMOS transistor may be small, chip space is needed for a circuit to generate a voltage threshold, V.sub.TH and the problem of noise is still present.
One solution proposed to reduce the match line recharge time is to increase the size of PMOS transistor 53 in recharging means 38. However, this also requires additional chip space, and enlarging transistor 53 increases the overall capacitance seen by match line 36, thus leading to longer discharge times for a given voltage drop on match line 36. Since the discharge rate of match line 36 is variable with the number of mismatches, and transistor 53 must be designed for the worst case, it must be large enough to replace the charge discharged in the worst case where all the match transistors 48 in a row indicate mismatches, thus chip space is likely to be underutilized by transistor 53. Transistor 53 must also be large enough to keep match line 36 charged even if the bit lines drift up a bit from ground, as this drift will cause some match transistors 48 to turn on slightly.
From the above it is seen that an improved means for keeping a match line from being discharged during the precharging period, an improved means for recharging the match line in a short time, and a means for quickly detecting a match or mismatch is needed.