In standard computer memory, e.g., random access memory (RAM), a user provides a memory address and the RAM returns a data word stored at the memory address. In contrast to standard computer memory, content-addressable memory (CAM) receives a data word from the user and searches the entire CAM array, in a single operation, and determines whether the data word is stored anywhere within the CAM array. Because CAM searches its entire memory in a single operation, it is much faster than RAM in search applications.
There are two types of CAM: binary CAM and ternary CAM (TCAM). Binary CAMs provide for the storing and searching of binary bits, i.e., zero and one (0, 1), comprising a data word. Ternary CAMs provide for the storing of three states, i.e., zero, one, and a “don't care” bit (0, 1, X). The “don't care” bit of ternary CAM allows for increased accommodation in searching data words. For example, a ternary CAM may store the data word, “11XX0”, which will match any of the searched-for data words: “11000”, 11010”, “11100”, and “11110”.
CAM is often used in computer network devices. For example, when a network switch receives a data frame from one of its ports, it updates an internal address table with the frame's source address and the receiving port's identifier. The network switch then looks up the destination address of the data frame in the internal address table to determine a port to which the data frame should be forwarded, and sends the data frame to its destination address on that port. The internal address table is usually implemented by a binary CAM so that the data frame is quickly forwarded to the proper port, reducing the latency of the network switch.
TCAM is frequently used in network routers, where each address has a network address that varies in size depending on the subnet configuration, and a host address, which occupies the remaining bits. The network address and the host address are distinguished by a network mask for each subnet of the network. Routing information to its destination in the network requires a router to look up a routing table that contains each known destination address, the associated network mask, and routing information needed to route packets to the destination address. Routing is performed rapidly by a TCAM, which masks the host portion of the address with “don't care” bits. TCAM, which masks the host address and compares the destination address in one operation, quickly retrieves the routing information needed to route packets to the destination address.
When searching CAM (binary or ternary), search data is loaded onto search lines and compared with stored words in the CAM. During a search-and-compare operation, the CAM performs a fully parallel search and generates a match or mismatch signal associated with each stored word, indicating whether or not the search word matches a stored word.
To allow this fast parallel comparison between all stored words to a single search word, each CAM word contains dedicated search hardware. Each CAM cell contains additional bit-comparison transistors and a storage element, which is typically implemented as a Static Random Access Memory (SRAM) cell. This added circuitry is combined across the CAM word with a match-line (ML) to produce a match or mismatch signal for each CAM word. This search hardware allows the entire contents of the CAM array to be searched in a single clock cycle, e.g., all stored CAM words are searched in parallel. Thus, in contrast to standard memory (e.g., SRAM and DRAM) which would typically require 1K clock cycles to complete a search of 1K words of memory, a CAM has the ability to search all entries simultaneously in a single clock cycle.
Unfortunately, as technology scales to submicron geometries, random device variation (RDV) is becoming more prominent. RDV of parameters such as transistor length, transistor width and transistor threshold voltage can be significant even in identically designed neighboring devices. The effects of RDV are especially evident in the design of semiconductor memories. Because most memories rely on sense amplifiers to detect small voltage signals on largely capacitive array lines, RDV in the memory cells as well as sense-amplifier devices can produce incorrect results. To improve reliability, memory designers tune their sensing circuits conservatively, thereby trading off performance in order to maintain a large sensing margin for reliable operation.
In advanced technologies (e.g., 100 nm and smaller gate geometry), RDV is becoming a major bottleneck for improving performance. As device variation increases, timing uncertainty for signal arrival and data capture increases, requiring larger data capture margins, and therefore limiting performance.
Due to its single-ended nature, the ML sensing performed during the CAM search operation is even more sensitive to RDV than the differential sensing used in the SRAM read circuitry. Thus, to maintain reliable operation, most ML sensing schemes employ full-swing sensing which is both slow and power-inefficient.
CAM design tradeoffs thus include search access time, power, and density. To improve power usage, a two stage sensing scheme is sometimes used for searching the CAM. The two stage sensing scheme includes a pre-compare (e.g., pre-search) and a main-compare (e.g., main-search). In the pre-compare, a small number of the bits in each CAM word are compared to the corresponding bits in the search word prior to the main ML being precharged for the power intensive main-compare. When the pre-compare shows a miss for a particular CAM word, the main-compare is not performed for that word, thus saving the power associated with performing the main-compare.
Traditional timing methodology requires completing the pre-compare before beginning the main-compare. This is becoming problematic, however, when margining (e.g., designing) for variation in the pre-compare CAM cell and/or pre-compare sense circuit. In particular, margining for the slowest statistically relevant pre-compare case results in the main-compare starting later than necessary for most cases. Timing uncertainty increases with device variation as device sizes shrink, and this large timing uncertainty on the pre-search completion is impacting overall CAM performance by delaying the start of the main-compare.
Accordingly, there exists a need in the art to overcome the deficiencies and limitations described hereinabove.