Content Addressable Memory (CAM) is a memory tailored for search operations. It contains a dedicated comparison circuitry to search through a table of stored data within a single clock cycle [1]. The conventional CAM cell, also known as Binary CAM (BCAM), consists of a memory unit, typically SRAM, and a bit comparison circuitry. In Ternary Content Addressable Memory (TCAM), an additional memory unit is used to improve search speeds at the cost of larger area and power overheads.
The 8T NOR CAM cell [1] implements the comparison operation through the use of four transistors, M1 though M4. The two pairs of transistors, M1/M3 and M2/M4, create pulldown paths connecting the ML to the ground. The comparison operation begins by pre-charging ML. If a match between SL and D occur, both pulldown paths are OFF, disconnecting ML from the ground, and ML stays charged. If a miss between SL and D occurs, a pair of transistors will open and discharge ML. When multiple cells join in parallel to form a CAM word, ML discharges if any of the cells has a miss. The search speed is dictated by the discharge rate of ML. In the worst case, where only one miss occurs, ML can only discharge through a single cell, thus limiting the search speed.
The 9T NAND CAM cell [1] implements the comparison operation through two transistors (MD and \MD) and a pass transistor (M1). The search operation begins by applying a high or a low voltage to the SL and an opposite voltage to its complementary \SL. During a match, node B charges, either through MD or \MD depending on the values of D and SL, thus opening the pass transistor MP and allowing the current to discharge. During a miss, node B remains low and closes the MP. Multiple cells are joined in parallel to form a word by connecting the pass transistors in series. The search operation begins by precharging one end of the word. Match current can flow through the ML only if all of the cells have a match. Since the ML resistance is directly proportional to the word length, large word length reduces match current, restricting the search speed. Moreover, the NAND CAM has a potential charge-sharing issue at ML. When a pass transistor is ON the charge is shared by the adjacent intermediate ML nodes. Thus, in the case when all bits match except the last cell, the charges are shared by all the intermediate nodes till the last cell. The charge sharing may cause the pre-charged node to drop sufficiently to result in a false match. To prevent such an error, intermediate ML nodes are also precharged to VDD at the cost of extra area overhead and power dissipation.
To solve certain footprint, speed, and power challenges, nano-electronic CAMs have been explored. In [9], a novel TCAM cell design is presented by replacing volatile SRAM with Magnetic Tunnel Junction (MTJ) to achieve zero standby power consumption. The design consists of two access transistors and two MTJs. The MTJs (D1 and D2) are joined in parallel and connected with ML through the access transistors (M1 and M2). Each pair of MTJs and access transistors forms a pulldown path connecting ML to \ML. The stored data D in the cell is programmed by using two kinds of resistance in the two MTJs. A high resistance state represents a logic value of ‘0’, and a low resistance state represents a logic value of ‘1’. Match or miss is determined based on the cell current with respect to the reference current. During a search operation the cells are evaluated sequentially. A word match is indicated only if all cells evaluate to a match. The advantages of the design are low area and zero standby power. The design only employs two transistors and two MTJs, therefore it is three times smaller than the conventional NOR CAM. The design was further improved to reduce active power consumption by power gating the row once a miss is found [10]. Other flavors such as [3] have also been proposed. However, the drawback includes low search speed (due to bit-by-bit evaluation) and potential errors (due to poor TMR and high variability). In addition, variations in the access transistor and wire resistance make sensing a challenge.
A Domain Wall (DW) based BCAM cell design was proposed in [4]. The design follows the conventional NOR CAM architecture for ultra-fast search operation, and replaces the SRAM with nonvolatile DW to eliminate standby-power. The cell design consists of two DW (R0 and \R0) and a dedicated comparison circuit. The comparison circuit is composed of a senseamp, four pulldown transistors (M1 through M4), and precharge and equalization transistors. Due to a difference in the resistance, the senseamp is biased to ‘1’ or ‘0’ depending on the data stored in R0. The search operation begins by precharging the ML. During match both pulldown paths stay OFF, disconnecting ML from the ground. During miss one pair of transistors opens (either M1/M3 or M2/M4) depending on the value of SL and D, and discharges ML. However, the design suffers from large area overhead induced by the dedicated sensing circuitry.
Another DWM CAM [5] employs a complimentary pair of magnetic nanowires that represent one word at a time to obtain the most reliable and fast access operation for CAM applications. The comparison circuit is designed based on a precharge sense amplifier. The CAM includes two MTJs connected together, forming the write heads. Due to the opposite directions of the write current pulse through these two MTJs, complementary polarities are nucleated in the nanowire. One of the critical challenges for complementary magnetic nanowires is to synchronize the domain wall positions. The current pulse is kept the same for both nanowires to solve this challenge. Identical physical notches are built in the nanowires to hold or pin the DWs and enable their synchronization. A pair of read MTJs are used for reading each bit of the storage element. This DWM CAM requires significant overhead from a CMOS sense circuit and the update operation is time intensive due to serial storage of data.
NAND flash memory typically contains stacked floating gate transistors that are used as a memory element. The information is stored in terms of a threshold voltage of the transistors. The presence or absence of charge on the floating gate corresponds to ‘1’ or ‘0’. Programming is performed by applying appropriate voltage to the transistor gate. The threshold voltage of store ‘1’ is less than 0V, whereas the threshold voltage is greater than 0V for store ‘0’. By stacking the bits vertically, NAND flash achieves very high density [12]. However the stacked design poses significant sensing challenges as the current difference between a ‘1’ and a ‘0’ state transistor is in the range of nano-amperes (nA).
The NAND sensing operation is based on the fact that the bitline (BL) capacitance discharges at different rates for transistors storing ‘1’ and ‘0’. Sensing is reference-less. At the start of sensing the BL is precharged to VDD. Next the read voltage (typically 0V) is applied to selected transistor gate whereas the unselected transistors in the string are applied a pass voltage (typically 4-5V). The BL capacitor discharges if the stored value is ‘1’. Next a sense voltage (Vsen) is applied on SEL. The magnitude of the sense voltage is such that Vsen−VBL>Vth if the stored value is ‘1’. This turns ON the SEL transistor and discharges output SO. If the stored value is ‘0’, the BL stays closer to the precharged value and the SEL transistor stays OFF. Therefore output SO stays at VDD. Note that sensing is slow to perform due to nA current ranges.