The present invention relates generally to the field of electronic circuits and in particular to a method of high speed access to a content addressable memory using encoded key fields and a stored encoded key.
Microprocessors perform computational tasks in a wide variety of applications, including embedded applications such as portable electronic devices. The ever-expanding feature set and enhanced functionality of such devices requires ever more computationally powerful processors. Hence, processor improvements that increase execution speed are desirable.
Most modern processors capitalize on the spatial and temporal locality properties of most programs by storing recently executed instructions and recently accessed data in one or more cache memories for ready access by an instruction execution pipeline. A cache is a high-speed, usually on-chip, memory structure comprising a Random Access Memory (RAM) and corresponding Content Addressable Memory (CAM). The instructions or data reside in a cache “line” stored in the RAM. To determine whether a particular datum resides in the RAM, a portion of its address is applied to the CAM. A CAM is a particular memory structure wherein an applied compare input (referred to herein as the key or search key) is simultaneously compared to data stored in each CAM entry (referred to herein as a key field), and the output of the CAM is an indication of which, if any, key field matches the key. In a cache, the key and key fields are portions of (virtual or physical) addresses, and if a match occurs (i.e., the access “hits” in the cache), the location of the match indexes the RAM, and the corresponding cache line is accessed.
FIG. 1 depicts a functional block diagram of a portion of one key field of a CAM structure, indicated generally by the numeral 100. The CAM key field j includes a match line 102 that spans all bit positions of the jth key field 110. The match line 102 is pulled high by a PRECHARGE signal turning on the gate of a pass transistor 104 connecting the match line 102 to power. At each bit of the jth CAM entry, a pass transistor 106 is interposed between the match line 102 and ground. The gate of the discharge transistor 106 is the logical XOR 108 of a key 112 bit and the corresponding key field 110 bit. At each ith bit position, if the key 112 bit and the key field 110 bit match, the output of the XOR gate 108 is low and the pass transistor 106 does not conduct charge from the match line 102 to ground. If the key 112 bit and the key field 110 bit mismatch, the output of the XOR gate 108 is high, turning on the pass transistor 106 and pulling the match line 102 low.
In this manner, if any bit of the key 112 mismatches with any corresponding bit of the key field 110, the match line 102 is pulled low. Conversely, only if every bit of the key 112 and the key field 110 match is no path to ground established, and the match line 102 remains high. A sense circuit 114 detects the level of the jth match line 102 at a time determined by the worst-case match line 102 discharge time. If each key field 110 is unique, which is the case in normal cache operation, then only one key field 110 should match the key 112. In that case, only one match line 102 within the CAM will remain high. To ensure this is the case, the output of each match line sense circuit 114 goes to a collision detection circuit 116, which detects multiple matches, and generates an error if they occur. In CAM applications other than a cache memory, multiple matches may occur, and a priority encoder (not shown) may select from among two or more key fields 110 that match an applied key 112.
The key fields 110 of a representative CAM may be 20 to 30 bits wide, and the CAM may include 256 entries. Thus, the CAM may include 5,000 to over 7,000 match line discharge transistors 106. To implement such a large number of transistors 106 in a small chip area requires that each transistor 106 be small. Since small transistors 106 have lower current-carrying capacity, they require a longer duration to discharge the match line 102 in the event of data miscompare. The worst case is a single-bit miscompare between the key 112 and a key field 110, wherein only one transistor 106 is turned on, and it must carry the current to dissipate all the charge on the match line 102. If two or more bits miscompare, then two or more transistors 106 work in parallel to discharge the match line 102 more rapidly. Consequently, the overall speed of operation of the CAM is determined by the timing of a single-bit miscompare.
Faster CAM operation can therefore be obtained by ensuring that at least two bits of every mismatching key field 110 will miscompare. It is known in the art to encode the key fields 110 (and correspondingly, the key 112) to increase their Hamming distance, which is the number of bits that miscompare between any two digital values. For example, a Hamming distance of two—also known as single-bit parity—ensures that, for a key 112 and key field 110 that differ by one bit, two bits will miscompare between an encoded version of the key 112 and an encoded version of the key field 110. In particular, the two miscomparing bits in the encoded versions are the bit that differs in the unencoded data, and the parity bit. Thus, encoding the key 112 and the key fields 110 with single-bit parity ensures that at least two bits will miscompare on every match line 102 where there is at least a one-bit difference between the key 112 and the key field 110. This ensures that at least two transistors 106 will pull the match line 102 low in parallel, resulting in faster CAM operation.
The key fields 112 can easily be encoded prior to being written to the CAM, when a cache line is replaced in the processing following a cache miss. However, for a physically tagged cache, part of the key 110—the page address—is retrieved from a Translation Lookaside Buffer (TLB) that performs virtual-to-physical address translation, and the remainder—the page offset—comprises the lower-order bits of the virtual address generated in the pipeline. In the case of a virtually tagged cache, the entire key 112 is generated in the pipeline. In either case, retrieving/generating the address and accessing the CAM is on the critical timing path, and there is insufficient time to encode all, or a large part, of the key 112 prior to comparing it against the encoded key fields 110, without increasing the machine cycle time.