1. Field of the Invention
The present invention relates to a content-addressable-memory (CAM) device that stores three values of “1”, “0”, and “X (undefined)” and to be used for searching data and a remedial method for the CAM.
2. Description of the Related Art
A CAM that stores three values of “1”, “0”, and “X” is referred to as a ternary content addressable memory (TCAM). The TCAM is one of memory large scale integrations (LSIs) that are frequency used in applications of address path search like a router and a switch of network apparatuses in recent years. A conventional CAM is schematically explained below with reference FIGS. 24 to 28. FIG. 24 is a block diagram of an example of a constitution of a network apparatus. FIG. 25 is a diagram of a packet constitution in an Ethernet (registered trademark) frame. FIG. 26 is a block diagram for explaining a main part constitution of a CAM shown in FIG. 24. FIG. 27 is a circuit diagram showing a constitution of a CAM array shown in FIG. 26. FIG. 28 is a truth table for explaining a relation between a stored value and a search value of a TCAM unit cell shown in FIG. 27.
In the network apparatus shown in FIG. 24, a CAM 100, a control unit (Central Processing Unit (CPU)) 200, an action memory (Static Random Access Memory (SRAM)) 300, a packet buffer (Dynamic Random Access Memory (DRAM)) 400, an interface unit (Ingress, Egress) 500 of the Ethernet are connected via a switching circuit 600. The CAM 100 serves as a searching unit and includes a MAC table 10a, an IP table 100b, and a filtering table 100c. 
As shown in FIG. 24, a packet in the Ethernet frame is generally standardized and includes a “Pre-amble” field, an “SFD” field, a “Destination” field, a “Source” field, a “Type/Length” field, a “Transmitting Message” field, a “PAD” field, and an “FCS” field.
In FIG. 24, a packet on the network is captured from the Ingress of the interface unit 500. A header section (Destination and Source) in the packet captured is sent to the CAM 100 for search. A content section (Transmitting Message) in the packet is sent to the packet buffer (DRAM) 400 to be held therein until search ends.
Usually, a plurality of number of times of search processing is required for one packet. For example, search of a transmission destination and a transmission source of an MAC using the MAC table 100a (L2 search), search of a transmission destination and a transmission source of an IP using the IP table 100b (L3 search), and filtering of a TCP and a UDP using the filtering table 100c (filtering of L4) are performed. Thus, search is performed five times per one packet. If a data sequence matching the IP transmission destination searched is found in the CAM 100, the CAM 100 outputs an address of the IP transmission destination matching the data sequence. Then, the CAM 100 reads out a rule that should actually be taken from the action memory 300 with the address as an index address.
In this way, the CAM 100 has a function of judging at high speed whether there is an address matching a data sequence requested to be searched in a search operation and outputting an address matching the data sequence to the outside. As the action memory 300, usually, an SRAM is used. As a representative action, other than designation of the next hop address for the transmission destination, filtering for discarding an unnecessary packet, Quality of Service (QoS) for prioritizing packet processing, and the like. If an action is determined, the CAM 100 rewrites the header section, reproduces the packet by attaching contents stored in the packet buffer 400 to the packet, and transmits the packet from the Egress of the interface unit 500 to the Internet environment.
A constitution and operations of the CAM 100 shown in FIG. 24 are schematically explained with reference to FIGS. 26 to 28. In FIG. 26, the MAC table 10a, the IP table 100b, and the filtering table 100c constituting the CAM 100 shown in FIG. 24 basically include a CAM array 110, a row decoder 106, a sense amplifier 105, a search-line driving circuit 102, a match amplifier 103, and a priority encoder 104, which are arranged in the outer periphery of the CAM array 110, respectively. The row decoder 106, the sense amplifier 105, and the search-line driving circuit 102 are connected to an input pin 101 serving as an input node. The match amplifier 103 is also referred to as “sense amplifier” like the sense amplifier 105. However, to clarify a function thereof, the match amplifier 103 is referred to as “match amplifier”.
The input pin 101 is used for data input and output for writing data in and reading data out from the CAM and used as an address input pin other than the application for search data input. In the writing in the CAM, when writing data is given from the input pin 101, the data is transferred to a bit line (BL) via the sense amplifier 105. When a writing address is given from the input pin 101, the row decoder 106 drives a corresponding word line (WL) to complete the writing. When the writing in all rows of the CAM array 110, almost all operations after that are search operations.
Actually, in the CAM used in the network apparatus, the search operations occupy about 90% of operations rather than writing and readout. In the search operation, when a data sequence to be searched is given to the same input pin 101 as a search request, the search-line driving circuit 102 drives a search line SL. A result of judgment on whether the data sequence to be searched and data sequence in the CAM array 110 coincide with each other appears on a common match line SL by a unit of data sequence. The match amplifier 103 amplifies a response of the match line SL at high speed. The response is transmitted to the priority encoder 104. A final matching address is output to the outside from a search-result output pin 107, which is an output node, to the outside.
As shown in FIG. 27, the CAM array 110 is constituted by arranging a TCAM unit cell 111, which holds three values of “1”, “0”, and “X”, in row and column directions. The TCAM unit cell 111 holds the three values of “1”, “0”, and “X” in the SRAM using two bits. A bit line BL and a word lie WL are used for writing and readout. This operation is basically the same as the operation of the SRAM.
In the search operation, the search line SL is driven on the basis of search object data as described above. A result of matching comparison appears on a match line ML. The match line ML is commonly used with the data sequence of the CAM controlled by the word line WL. The match line ML is pre-charged to a high level before starting the search operation. Only the match line ML of data sequence matching (HIT) in comparison of all bits maintains the high level (FIG. 28: ML=H). Conversely, when at least one bit mismatches (MISS), the match line ML discharges to be at a low level (FIG. 28: ML=L).
Usually, since the third value “X” is not used in the case of the L2 search, the matching data sequence is an only address and the match lines ML of the other all addresses are discharged. On the contrary, in the case of an application using the third value “X”, a plurality of addresses are often hit (matches) simultaneously (see FIG. 28). In this case, the priority encoder 104 treats an address with a smallest value preferentially and encodes and outputs the address.
Therefore, usually, a more specific data sequence not having the “X” value is maintained in an application to be sorted to an order with a smaller address number (Longest Prefix Match). In the priority encoder 104, respective cells communicate with one another in up and down directions. When a large number of matches (hits) occur, first, the priority encoder 104 judges whether there is a hit above or below an own cell and, thereafter, finally outputs an encode address.
Such a CAM executes searches simultaneously and in parallel compared with the conventional search according to the tree method and the hash method. Thus, it is possible to perform processing at high speed and in a fixed time. However, since the simultaneous parallel searches use an entire memory area as an operation area, this means that the entire memory area is activated simultaneously. Power consumption at the time of the search operation imposes a significant problem in a large capacity CAM with 18 M bits at the present. Specifically, whereas power consumption of the usual SRAM is about 1 watt to 2 watts, power consumption at the time when, for example, the 18 M bit TCAM performs a search operation in the order of 100 megahertz is equal to or larger than 10 watts. Basically, power consumption is proportional to an activation area at the time of the operation. It is important for low power consumption to find how the operation area should be reduced without sacrificing the high-speed search performance. From such a viewpoint, various studies have been performed (e.g., K. J. Schults, and P. G. Gulak, “Fully Parallel Integrated CAM/RAM Using Pre-classification to Enable Large Capacities” IEEE Journal of Solid-State Circuits, vol. 31, No. 5, pp. 689-699(May 1996 and H. Noda, K. Inoue, M. Kuroiwa, F. Igaue, K. Yamamoto, A. Hachisuka, H. J. Mattausch, T. Koide, A. Amo, S. Soeda, I. Hayashi, F. Morishita, K. Dosaka, K. Arimoto, K. Fujishima, K. Anami, and T. Yoshihara, “A cost-Efficient High-Performance Dynamic TCAM with Pipeline Hierarchical Searching and Shift Redundancy Architecture,” IEEE Journal of Solid-State Circuits, vol. 40, No. 1 pp. 245-253, January 2005).
As it is understood from the above explanation, there are two causes for the problem of power consumption involved in the search operation by the CAM. First, a data sequence to be searched is converted into processing of activation of the search line SL. As described above, all the search lines SL on the CAM array are activated simultaneously and in parallel to one another. Second, all the match line ML of the data sequence mismatching because of the activation of the search lines SL are discharged. As described above, almost all the match lines ML repeat charging and discharging. Actually, if the search lines SL and the match lines ML are removed, a structural operation is the same as that of the SRAM, and power consumption is not at a problematic level.
A plurality of times of search operations are required for one packet. On the other hand, when a capacity of the CAM was as small as 1 M bit, different CAMLSIs were prepared for respective applications such as for L2 and L3 and search was executed in a minimum activation area first for L2 and then for L3. Thus, power consumption was not a significant problem.
Thus, a method of realizing low power consumption following the example described above even in a large capacity CAM with 18 M bits in these days is proposed in, for example, U.S. Pat. No. 6,324,087 (FIG. 29) and U.S. Pat. No. 6,470,418 (FIG. 30). FIGS. 29 and 30 are block diagrams of conventional examples for realizing low power consumption of a large capacity CAM.
In FIG. 29, reference signs 110a to 110d denote CAM sub-arrays. Reference signs 102a to 102d denote search-line driving circuits, which are connected in parallel to a bus on which a data sequence requested to be searched is arranged. Reference signs 103a to 103d denote match amplifiers. Outputs of the match amplifiers 103a to 103d are sent to an output bus in parallel. The priority encoder 104 extracts and outputs the outputs to the outside. An address translation logic 120 that can designate four sub-arrays individually is provided.
In this constitution, the address translation logic 120 issues bank address signals BS_0, BS_1, BS_2, and BS_3. This allows the four sub-arrays to perform a dividing operation. For example, the BS_0 sub-array is defined as a sub-array for L2 and the BS_1 sub-array is defined as a sub-array for IP and the bank address signals BS_0, BS_1, BS_2, and BS_3 are issued from the address translation logic 120, respectively, at the time of the search operation. This makes it possible to prevent all the sub-arrays from being activated simultaneously and realize low power consumption.
In FIG. 30, a CAM array is divided into two and the respective divided arrays include four sub-arrays as shown in FIG. 29. In the figure, a search request (1/2) is input to a first divided array on the left and a search request (2/2) is input to a second divided array on the right. Outputs of match amplifiers 103a_1, 103b_1, 103c_1, and 103d_1 in the first divided array on the left are given to corresponding match lines of CAM sub-arrays 110a_2, 110b_2, 110c_2, and 110d_2 in the second divided array on the right. Outputs of match amplifiers 103a_2, 103b_2, 103c_2, and 103d_2 in the second divided array are sent to an output bus in parallel. The priority encoder 104 extracts and outputs the outputs to the outside. In this constitution, first, search is performed in the first divided CAM array and only a match line ML of the second divided array connected to a sub-array having a matching match line ML operates. Thus, it is possible to realize low power consumption.
The CAM also has a problem of manufacturing cost in addition to the problem of power consumption. A technology generation for memory LSIs changes in about every four years. Only a capacity of the memory LSIs is expanded to twice to four times as large as that in the previous generations while keeping substantially a fixed price. A bit unit price is reduced to a half or a quarter every time the generation changes. However, this trend rule is not applied to the CAM at all, although the CAM is one of memory LSIs. For example, a market price of a 4.5 M bit TCAM is about 50 dollars and a market price of a 18 M bit TCAM having a capacity four times as large as that of the 4.5 M bit TCAM is equal to or higher than 200 dollars. There are various factors affecting market prices such as competitions among companies in the same businesses and supply and demand balances. A unit price per bit of the CAM is about twenty times as high as that of the SRAM because of product cost.
In the memory LSIs, it is possible to keep fixed prices and manufacturing cost even if a memory capacity increases to twice to four times as large as those of previous ones. This mainly depends on a technology for refining and improvement of a yield. The refining technology involved in formation of transistors, wiring layers, and the like has significantly improved the number of elements and a memory capacity mountable per a unit area. On the other hand, the increase in a mounted capacity means an increase in sensitivity for adhesion of dust (particulates) that is a main cause of defects affecting a manufacturing yield. Thus, the manufacturing yield is extremely reduced. For example, as the refining technology advances to reduce a size to 0.25 micrometer, 0.15 micrometer, and 90 nanometers, a memory capacity mountable per a unit area increases. However, in general, a test yield in manufacturing decreases. Nevertheless, a situation is different in the case of the memory LSIs.
In the case of the memory LSIs, a test process in manufacturing is divided into a pre-test process and a post-test process. Pre-test indicates a test and a yield before remedial measures (repair) at the point when a manufacturing process ends. Post-test indicates a test and a final yield after remedial measures after the pre-test. This repair technology itself makes it possible to keep the substantially fixed market prices of the memory LSIs in these days.
For example, when it is assumed that pre-test yields of the memory LSIs in the technologies for 0.25 micrometer, 0.15 micrometer, and 90 nanometers are 60%, 50%, and 40%, respectively, remedies by repair measures increase by 25%, 35%, and 40%. As a result, post-test yields indicate a fixed value of 85%. Consequently, it is possible to realize substantially fixed manufacturing cost regardless of a mounted memory capacity. As in this example, a more refined technology contributes to a repair technology more. For the CAM, a fixed cost structure and a fixed market price cannot be kept when a mounted capacity increases. A reason for this may have a significant relation with the repair technology.
Data writing in and readout from the CAM are executed for a data sequence selected by a decoder. On the other hand, searches are executed for the CAM array simultaneously and in parallel. As a result, an address with all data matching is encoded and output to the outside. In other words, whereas only a decoder is adjacent to a memory array of a usual memory (DRAM or SRAM), a decoder and an encoder are adjacent to a CAM memory array. In addition, in the case of the TCAM, as described above, a plurality of data sequences are often hit simultaneously at the time of search according to the effect of the “X” value. Thus, a priority encoder that encodes data matching at “0” and “1” with the least “X” value as a final address is generally adopted.
In this case, mounting of the priority encoder reduces an area occupied by a memory cell. For example, in the case of the 18 M bit TCAM using the 0.13 micrometer technology, a decoder has an occupied area of 0.03 as opposed to a TCAM cell sub-array area 1. The priority encoder has an occupied area of 0.19. As a general technology for the decoder, other addresses are simultaneously activated and a block is selected again in an I/O part. However, in the priority encoder, an encoder cell has to-be mounted on each address.
As described above, an array constituting the CAM requires signal wiring of the search line SL and the match line ML other than the bit line BL and the word line WL. In other words, for example, when the TCAM is created based on a technology in which the SRAM is manufactured with four layers of wiring, normally, a manufacturing technology for six layers of wiring is required. This complicates processes. This means that the CAM memory cell becomes excessively sensitive to adhesion of dust (particulates). Even if the CAM is manufactured in a factory where a frequency of dust generation is equivalent to that for the SRAM, a manufacturing yield for the CAM is low compared with that of the SRAM. It goes without saying that it is possible to forcibly create a CAM memory cell with four layers of wiring. However, in that case, a cell area significantly increases and, eventually, manufacturing cost per bit increases.
A serious problem in realizing a reduction in manufacturing cost for the CAM is that it is difficult to adopt a remedial technology (a repair technology) by mounting redundant circuits. In the SRAM, even if dust (particulates) adheres to a memory cell to cause a defect, it is possible to behave as if there is no defect by replacing the memory cell or a memory cell group with redundant circuits prepared in an entirely difference place. However, in the CAM, various test methods are proposed (e.g., Japanese Patent Application Laid-Open Nos. H9-180498, H6-131897, H8-147999, 2002-260389, and H5-190788). However, since there are several technical problems, it is difficult to apply the repair technology to the test methods.
To perform remedies using the repair technology, first, an address having a defect has to be accurately recognized by a test. In this regard, for the SRAM or an SRAM unit in the CAM cell array, it is possible to perform writing and readout in units of bit. Thus, if writing and readout for each address and bit are repeated, it is possible to relatively easily recognize and specify a defective address. However, in a search unit of the CAM, such accurate recognition of a defective address is not easy.
As explained with reference to FIG. 27, when all data sequences in a large number of CAM cells (data sequences of the CAM) connected in parallel in a wired-OR connection system and search request data sequences match for each bit, the match line ML maintains the high level to which the match line ML is pre-charged. Conversely, if at least one bit does not match, the match line ML discharges to be at the low level.
Therefore, for example, when the match line ML discharges because of a certain kind of defect, if the defect is only one bit, it is possible to detect a defective address by rewriting data bit by bit after setting an expected value to a “HIT state”. However, if at least another one bit of MISS is present, the match line ML discharges from the MISS bit. Thus, it is impossible to detect a cell including a search port having a defect.
The above is a problem of the CAM cells connected to the match line ML in the horizontal direction in the wired-OR connection system. The priority encoder present in the vertical direction also makes it difficult to apply the repair technology. There are two aspects in the difficulty in applying the repair technology.
In a first aspect of the priority encoder that makes it difficult to apply the repair technology, since cells of the priority encoder communicate with cells above and below the cells for priority control, it is not allowed to move the priority encoder to an entirely different place when a defect occur. Therefore, in the CAM, for example, as described in U.S. Pat. No. 6,751,755 (FIG. 31), the remedial technology by mounting redundant circuits is usually adopted in a memory array but is not adopted in the priority encoder in which occupancy cannot be neglected compared with the decoder. Thus, a fall in cell occupancy directly means a fall in a manufacturing yield.
FIG. 31 is a block diagram showing a conventional example for improving the manufacturing yield of the CAM. In FIG. 31, data sequences RML0 and RML1 read out from redundant memory arrays by REDUNDANT ROW0 AND REDUNDANT ROW1 are amplified by sense amplifiers RSA0 and RSA1 corresponding to the match amplifiers in this specification to be match line outputs Rmatc0 and Rmatch1 on a redundancy side. On the other hand, data sequences ML0 to MLn readout from an original memory array are amplified by sense amplifiers SA0 to San to be match line outputs Match0 to Matchn on a memory array side.
Selectors 114-0 to 114-n are provided between this signal and input signal lines PEln0 to PElnn to a PRIORITY ENCODER. The selectors 114-0 to 114-n connect a Match side to the input signal lines PEln0 to PElnn when there is no defect and connect a Rmatch side to the input signal lines PEln0 to PElnn when a defect occurs and repair is necessary. According to this constitution, a priority relation with the cells above and below necessary in operating the CAM is observed in the PRIORITY ENCODER even after the repair.
FIG. 32 is a block diagram showing an example of a case in which the memory cell occupancy is increased in the CAM. Usually, in a design of a memory LSI, an array arrangement and a sub-array arrangement of a memory are devised with priority given to cell occupancy. When a mounted capacity of the CAM increases, occupancy of the priority encoder is reduced to increase the memory cell occupancy according to, for example, a method shown in FIG. 32.
In FIG. 32, 110_00 to 110_0n, 110_10 to 110_1n, and 110—m0 to 110—mn are divided sub-arrays. Redundant memory arrays 108 are provided in the respective sub-arrays. Match lines of the sub-arrays 110_00 to 110_0n are connected to a common match line 105_0 via match amplifiers 103_00 to 103_0n. Match lines of the sub-arrays 110_10 to 101—n are connected to a common match line 105_1 via match amplifiers 103_10 to 103_1n. Match lines of the sub-arrays 110—m0 to 110—mn are connected to a common match line 105—m via match amplifiers 103—m0 to 103_0n. The common match lines 105_0 to 105—m are connected to the priority encoder 104. In this way, a method of layering match lines and, in the example shown in FIG. 32, commonly connecting outputs of the match amplifiers 103_00 to 103_0m, to which local match lines are connected, to the global match line 105_0 and then inputting the outputs to the priority encoder 104 is also adopted.
A second aspect of the priority encoder that makes it difficult to apply the repair technology is related to an algorithm for determining a priority. When a plurality of addresses are simultaneously hit rather than a single address as a result of execution of a search instruction and a plurality of match lies ML maintain the high level, as described above, usually, the priority encoder preferentially outputs a smallest address. As a result, in a large number of address groups other than the prioritized address, it is neglected whether a search result is HIT or MISS as well as whether operations of the addresses normally function. In this way, the function of the priority encoder hinders easiness of confirmation that should be the essence of the defect detection test.
However, concerning the reduction in power consumption, the technology described in U.S. Pat. No. 6,324,087 cannot easily change table sizes for respective applications. For example, from a viewpoint of an application, a table size required for filtering is twice or more as large as that for L2 and L3. Since filtering search is also diversified, it is desirable to perform control with various degrees of freedom. For example, an entire filtering table is set as an object of search or the filtering table is layered to set only a part of the table as an object of search.
The technology described in U.S. Pat. No. 6,470,418 can program a first CAM array data sequence according to a table size desired by a user. Thus, it is possible to designate a table size with a high degree of freedom. It is possible to avoid the problem in the technology described in U.S. Pat. No. 6,324,087. However, since all data sequences have extended CAM data sequences, cost overhead in terms of hardware for the extended CAM data sequences is large.
Concerning manufacturing cost for the CAM, as described above, the number of process layers increases because of complexity of the CAM memory cell, a yield falls because of the multi wiring layer structure, and it is difficult to adopt the remedial technology by mounting the redundant circuits equivalent to other memories for priority control. Therefore, remedial measures in the CAM are extremely poor compared with the other memories (SRAM and DRAM) and a manufacturing yield of the CAM is extremely low compared with the other memories. As a result, manufacturing cost per bit increases and a market price for the CAM is high.
In the technology described in U.S. Pat. No. 6,751,755 for improving a manufacturing yield of the CAM, as described above, it is possible to repair a defect in a memory area. However, it is impossible to repair a defect that occurs in the priority encoder or the sense amplifier. For example, in the 18 M bit TCAM using the 0.13 micrometer technology, an area ratio between the memory array and the sense amplifier and priority encoder is about 5:1. Compared with an area ratio between the memory array and the decoder of 39:1, it goes without saying that the repair technology is required for the priority encoder and the match amplifier (the sense amplifier) peculiar to the CAM.
In the method shown in FIG. 32 for increasing memory cell occupancy of the CAM, signal wiring such as the global match line is required. This further deteriorates durability against adhesion of particulates.
In general, failures distinguished by a result of a test are roughly classified into a functional failure (hereinafter, “operation functional failure”) and a marginal failure (hereinafter, “operation marginal failure”). The operation functional failure indicates a hardware error that can be always observed under any test conditions such as temperature and voltage and is often caused by a process. On the other hand, the operation marginal failure is reproduction of a failure only under a certain condition such as a high operation frequency region or a high voltage side and is often caused by design.
In production, the categorization of the operation functional failure and the operation marginal failure is an important factor that should be evaluated in production management. For example, when the operation functional failure always occurs at a high failure rate, technical improvement is required for the process technology. When a failure rate of the operation functional failure is not so high but only a failure rate of a certain kind of operation marginal failure is high, it is possible that the failure is caused by some design deficiency peculiar to a product. Thus, an analysis and improvement from that aspect are performed. Usually, the operation marginal failure should transition at a failure rate lower than that of the operation functional failure. If a process technology is identical, a significant difference should not occur in a manufacturing yield depending on a product.
Concerning a test, as details of an actual test time, a test for detecting the operation marginal failure is more complicated than a test for detecting the operation functional failure. Thus, a test time for the operation marginal failure is longer. This is because a certain type of accelerated test condition peculiar to the operation marginal failure is created. In the SRAM and the DRAM, other than the operation functional failure, the operation marginal failure and tests for detecting the operation marginal failure are often studied. However, there is almost no study report concerning the operation marginal failure for the CAM and a test method for detecting the operation marginal failure.
Actually, a system environment in which the CAM is used is far stricter than a system environment in which the SRAM and the DRAM are used. This is because, as described above, since electric power equal to or larger than 10 watts is consumed in the single CAM, the power consumption significantly affects an operation margin of not only the LSI but also power supply impedance including a board. It is sufficiently possible that a CAM that has passed a test on a factory shipment test board may have the operation marginal failure on a system board used by a user. Therefore, it is meaningful in securing a quality after product shipment for the CAM to distinguish the operation functional failure and the operation marginal failure on a system and implement a test that can detect the failures.