Electronic components, such as semiconductor memory devices, can provide very rapid, highly compact memory and related functions. However, due to uncontrollable events such devices can generate errors. While manufacturing defects, such as those resulting from uncontrollable process variation, particles and the like, are often readily addressed by redundancy techniques, other types of errors can be more difficult to address.
As but one example, a memory device suffering from no manufacturing defects may still be subject to “soft” errors. Soft errors are most often attributed, either directly or indirectly, to sub-atomic particles traveling through a semiconductor substrate and generating electron hole pairs. Such electron hole pairs can cause a change in state of the data stored within a memory device. Soft errors can become particularly problematic as semiconductor device sizes continue to shrink in size.
To address errors arising from undesirable changes in data values, error correction techniques are known and have been proposed for conventional memory devices, such as dynamic random access memories (DRAMs), static RAMs (SRAMs), and various ROM type memories (EPROMs, EEPROMs and “flash” EEPROMs).
It is noted that the term “error-correction” as used herein, refers to the ability to return correct data values/result (in the event errors exist in stored data values) on every read operation (and also for every search operation in the case of TCAMs). Erroneous data present within a memory device can be corrected according to various techniques well known to those skilled in the art. For example, one conventional approach to addressing errors in stored data values can be to store a corrected data value in temporary storage and find an empty cycle to write it back to the memory location containing an erroneous value.
To better understand various features of the disclosed embodiments, a conventional technique for providing error-correction in an SRAM device will be described with reference to FIG. 19. FIG. 19 is a block diagram showing a packet processing system 1900 that includes a packet processor 1902 and an associated SRAM 1904. Within SRAM 1904 each addressable location can store both a data value 1906-0 as well as a corresponding error correction code (ECC). According to well-understood techniques an error correction code can be generated by applying a function to the data value 1906-0.
In an SRAM read operation, an address “addr” can be applied by an SRAM controller within packet processor 1902 to SRAM 1904 to thereby read data from an addressable location. In response, SRAM 1904 can output both a data value and ECC value “data, ecc” (e.g., 1906-0 and 1906-1). For example, error correction logic within an SRAM controller of packet processor 1904 can perform 1-bit correction and 2-bit detection of errors in the read data utilizing the ECC bits.
A resulting corrected (if necessary) data value can be returned to an entity (such as a processor thread being executed by packet processor 1902) that requested the data from the SRAM 1904. Such error correction can be considered “on-the-fly” error correction, as the data is corrected on the fly as it is read.
While the above conventional approach can provide fast error correction in an SRAM, such an approach can present problems if attempted with a content addressable memory (CAM). This is because of the intrinsic nature of the operation of CAM as will be described now in conjunction with FIG. 20.
FIG. 20 shows a packet processing system 2000 that includes a packet processor 2002 and an associated CAM 2004. Within CAM 2004, each addressable location (e.g., entry) stores both a data value 2006-0 as well as a corresponding ECC value 2006-1.
When a key is searched in a CAM 2004, all entries in the CAM 2004 can be searched, and the result output can be either a “miss”, or a “hit” along with the index of a best matching entry “index”. That is, in a search operation, data values for all searched CAM entries are accessed for comparison with a key value. This is quite different from the SRAM case in which a single address is accessed.
Furthermore, unlike SRAM entries, CAM entries can present two different types of errors. As is well understood, in a search operation, a CAM entry can generate a hit (all bits match a key) or a miss (one or more bits do not match the key). If a CAM entry includes one or more errors, it can generate not only a “false miss” (one or more bits erroneously force a mis-match) but also a “false hit” (one or more bits erroneously force a match).
As a result, in the case of a CAM 2004, if even one searched entry anywhere in the CAM has a soft error, it can affect a final result output. Thus, if on-the-fly error correction for a search operation is desired, such a conventional arrangement would require that all searched entries are checked (and corrected) according to their respective ECC bits for every search. This is prohibitively expensive since each CAM row is designed and placed carefully to optimize layout and density, and an ECC operation (which needs access to all bits for its operation) could destroy the carefully placed arrangement of CAM cells.
Consequently, approaches like that for the SRAM shown in FIG. 19 are not suitable for providing SRAM-like on-the-fly error correction for a CAM.
Three general conventional approaches to error handling in CAMs are known.
In a first conventional approach, on-the-fly error correction is abandoned entirely. Instead, error detection or correction is performed in the “background” of the normal CAM operations, utilizing a software or hardware scan. In this scheme, ECC bits are kept with each CAM entry, and every so often a hardware engine (or software routine) can read a next entry in the CAM and detect/correct using the ECC stored in the entry. Such approaches are often called “scrubbing” the CAM entry.
A disadvantage to this first conventional technique is that a CAM could generate incorrect results for a long time before the background scan comes around to correcting the CAM entry containing the error. More precisely, for a CAM block with 4K (4096) entries, if entry numbered X gets an error immediately after the scan has crossed it, it will be another 4K scrub operations before the background scan engine comes around to checking and correcting entry X. If the “scrub” is performed once every 1000 cycles (being executed in the background), this could result in 4 million wrong search results before the error is addressed. Such a relatively high rate of possible error can be unacceptable for several CAM applications. One such application can be CAMs used to search access control lists (ACLs) that for filtering incoming packets for network security.
One approach to improving the performance of the first conventional approach can be to increase the scrub rate. However, doing so reduces the performance of the CAM device, and may still present high potential for errors. For example, if 1 scrub was executed in a CAM every other operational cycle (at a hefty 50% overhead), 8,000 wrong search results could be generated before the error was corrected.
A second conventional approach will now be described with reference to FIG. 21. FIG. 21 is a block diagram showing a packet processing system 2100 configured for error correction. It is noted that the approach shown is used only for solving the limited problem of detecting of false-hits, and hence is, strictly speaking, not related to the problem at hand of providing on-the-fly error correction. That is, it is not possible to provide error correction with this approach. However, examining the operation of the system is believed to be helpful in understanding concepts of the various embodiments of the invention described below.
The packet processing system 2100 includes a packet processor 2102, a CAM 2104, and an SRAM 2106. In the technique of FIG. 21, each entry of a CAM 2102 can store a data value (e.g., E1, E2). In addition, each entry of SRAM 2106 can store such data value, along with ECC bits for the data value. In a search operation, in response to an applied key value “key” (and assuming a “hit” result), CAM 2102 can return an index value “indx”. Subsequently, the SRAM 2106 can be read utilizing the index value to access the corresponding data value and ECC value. The data value and ECC value can verify that the entry stored in SRAM matches the incoming key.
The approach of FIG. 21 is not without disadvantages. In particular, as noted above, the approach is only capable of providing detection for false-hit cases. That is, false misses are not addressed.
For example, suppose a soft-error causes a “false hit” on CAM entry 2106-1 storing value E1 (i.e., a soft-error causes an entry that would otherwise miss to cause a hit). SRAM entry 2108-1 will indicate that that data value E1 does not match the key and the packet processor 2102 can properly detect the error. However, the packet processor 2102 cannot immediately correct the error. The error can be corrected by some later write operation to entry 2108-1 (to overwrite erroneous data). A search with the same key would then have to be repeated.
However, this technique does not work if there is more than one error in some entry in of the CAM. That is, if another entry causes a false hit, the same process would be repeated. This can result in delays and non-deterministic search times.
Still further, suppose that CAM entry 2106-2 generates a “false miss”, i.e., data value E2 should have matched the key but there was a soft-error which caused the CAM entry 2106-2 to be a miss. In such a case, a CAM 2104 would return a “miss” indication, and there is no way to know whether such a miss is valid, or the result of an error in some entry (which would have otherwise hit the incoming key).
Thus, the above approach cannot detect (let alone correct) “false miss” cases.
A third conventional approach is shown in FIG. 22.
FIG. 22 shows a system 2200 that employs error-correction based on majority voting. In the example shown, a same database can be stored in three different CAMs (2202-0, 2202-1 and 2202-2). A packet processor (not shown) issuing searches to the CAMs (2202-0, 2202-1 and 2202-2) can evaluate a search result by taking a majority result. Such an approach may be sufficient for a single-bit error (occurring over any of the three CAMs). However, the approach may not be sufficient in the event of more than a single-bit error, because a one-bit error occurring anywhere (not just in the same entry location) in a majority of the CAMs (in this case two CAMs) can defeat the scheme. This is described in more detail in the following example.
Referring still to FIG. 22, the three CAMs (2202-0, 2202-1 and 2202-2) would ideally contain the same database. Assume that three entries in each CAM (2204-01-03, 2204-11-13, and 2204-21-23) are supposed to store data values E1, E2 and E3, respectively. Further, assume that an applied key is supposed to match value E3. In such an arrangement, the correct behavior of the above entries of each CAM (2202-0, 2202-1 and 2202-2) would be to generate a miss, miss, and hit respectively.
However, if there is a “false hit” error in entry 2204-01 of CAM 2202-0 and a “false hit” error in entry 2204-12 of CAM 2202-1, a search would lead to three different results, and hence could confuse the majority voting logic. Thus, this arrangement illustrates how two one-bit errors anywhere in the CAMs can defeat the scheme.
A variation of the above third conventional approach can be to repeat a database within one CAM device. That is, a database can be replicated inside multiple blocks within a CAM. Logic can compare the search results from such different blocks according to majority voting. This approach again suffers from the limitation that two one bit errors anywhere in the CAM blocks can defeat the scheme, and is hence not as powerful as protecting each entry against one bit errors. Such an arrangement is shown in U.S. Pat. No. 7,254,748, titled “ERROR CORRECTING CONTENT ADDRESSABLE MEMORY” and issued to Wright et al. on Aug. 7, 2007.
In light of the above, it would be desirable to arrive at some of providing error correction in a CAM device and/or system that does not suffer from the above drawbacks of the conventional approaches.