The present invention relates to semiconductor integrated circuits and, more particularly, to test circuits built into the integrated circuits (ICs) that enable efficient testing of embedded memory, especially read/write memory.
As integrated circuits achieve higher and higher levels of integration it is common to find several memory blocks of differing sizes embedded within blocks of logic in the integrated circuit. A typical example of embedded memory is the data and instruction cache memories along with their associated tag and valid data cache memories that are embedded in most modern microprocessors. These memories are called "embedded" because they are not directly accessible from the input and output pins of the integrated circuit chip. Instead, an embedded memory is separated by logic blocks from the input and output pins in ordinary operation of the circuit. Testing of these embedded memories is therefore complicated because any access to these memories during normal operation of the chip is mediated by the associated logic.
Integrated circuits are widely used because they offer a high functionality per unit of cost. To achieve the economies necessary in modern integrated circuit manufacturing, it is necessary to minimize both the cost of the raw circuit as well as the cost of testing it. In many cases, the cost of testing the device is comparable to the cost of manufacturing the raw die in the fabrication plant. The cost of a functional die is roughly proportional to the inverse exponential of the die area. Therefore, it is necessary to minimize the die area in order to minimize die costs. The cost of testing is approximately proportional to the product of the test time and the cost of the testing equipment. Therefore, it is desirable to minimize both the test time and the complexity of the test equipment to minimize testing costs.
Testing of memories is generally accomplished by applying test vectors to the memory and reading back the results to ensure proper memory operation. However, testing an embedded memory through the surrounding logic may require a number of test vectors larger than the available memory available in the automatic test equipment used for testing the device and is, in any case, very time-consuming. It is additionally undesirable because the development of programs to execute such tests requires a large amount of skilled test engineering time, which adds to the overhead costs.
Another possible approach to testing embedded memories is to connect the control, address, and data lines of the memories to external pads of the integrated circuit. Multiplexer blocks are implemented within the integrated circuit to connect the embedded memories either to the external pads for testing or to internal buses for standard circuit operation. A drawback to this approach is that the extra bus lines and pads increase the size of the semiconductor die and the extra pads increase the number of pins required of the tester. The cost of the tester is generally roughly proportional to the number of pins. Since the trend is toward wide memories of increasingly large capacity in modern ICs, the number of extra buses and pads required can frequently exceed one-hundred, which represents a prohibitive cost burden.
To avoid excessive costs while simultaneously providing adequate fault coverage, there has been a movement toward built-in self test (BIST) of integrated circuits. This approach relies on circuitry built into the integrated circuit to test the memories and report the results to off-chip electronics by means of a restricted number of pins. An example of BIST methodology are the commonly-used Joint Test Action Group (JTAG) standards. Special test modes which disable the normal operation of the circuit are invoked to enable BIST.
BIST attempts to provide complete fault coverage while minimizing test time and the area of the die that is occupied by the BIST circuitry. In some applications, it is also desirable that diagnostic information be available for faults that are detected. These requirements are in conflict because adding diagnostic capability adds size to the BIST. Various schemes have been developed which optimize one factor at the expense of the others.
One method for reducing the area on the chip devoted to data buses is to use a serial data-in line and a serial data-out line. Buffers are loaded serially and then used for parallel operation during writing, reading and comparison of the results read from the memory with the stored data. A disadvantage to this approach is that the maximum operational frequency is reduced by the width of the data word (e.g. 32 bits), so that the memory is tested at much less than operational frequency. Thus, faults that appear only at normal speed operation, such as capacitive coupling faults and transition faults, are not detected. Another consequence is that the time needed to test the memory is increased by the time necessary to load the buffers serially. This can increase the test time by a factor approximately equal to the width of the memory words.
Another approach is to add multiplexers to the memory input/output lines such that the data read from the memory can be loaded back into adjacent bits during the subsequent write while the memory is in the test mode. Thus, the data from bit 1 is available for writing into bit 2; the data from bit 2 is available for writing into bit 3; etc. The first bit receives new data and the data output from the last bit is routed back to the finite state machine BIST controller for comparison. In operational mode, the multiplexers connect the memory data lines to the chip data bus. Because data is always available for writing when a read operation is completed, the memory may be tested at operational speeds, which increases the quality and accuracy of the test procedure.
Several ways of implementing this scheme are possible. In one possible implementation, the output of the last bit of a word in the first memory is fed into the input of the first bit of a word in the second memory, etc. so as to make all of the memories into effectively one very wide memory for testing purposes. Another implementation involves adding a series of control lines so that each memory can be enabled separately. This allows each memory to be tested sequentially. In the case that the embedded memories are of differing depths, the second method must be used because the first method requires that the memory depths be the same.
There are certain drawbacks to these approaches. For example, although the above implementation offers the advantage of small area utilization, it is nonetheless relatively slow. Furthermore, in the case of a failure, all that is known is the word address of the failure. Information as to which bit failed is not available because the word is structured to operate as a serial shift register with no internal observability. Indeed, in the case that the first proposed method of chaining words in parallel is used, not even the memory that failed can be ascertained. For simple pass or fail testing, it is sufficient to identify that a failure has occurred. However, if redundancy is to be used to repair the failure or if the cause of the failure is to be analyzed, critical information is not available. In fact, if the word were to contain an even number of transition or capacitive coupling faults which cause the bit to read the opposite of the intended data, even the presence of the faults is masked.
An alternate approach is to generate data patterns and address sequences centrally and route them to the embedded memories. This approach is faster than the above serial test approach, especially if several embedded memories are tested in parallel. A drawback to this approach is that routing the extra data and address buses consumes significant amounts of area on the chip as the data path width increases from the historical size of 8 bits to 32 or 64 bits, which are increasingly common. It may not be possible to use the same buses for testing and normal operation because the testing signals should be routed in parallel to the embedded memories while the buses in operation are often separate, e.g. the case of data and instruction caches. This means that testing requires extra buses plus a multiplexer per data and address line.
It has been proposed to reduce the busing area by using a separate pattern generator for each array to be tested and routing only a simple coded instruction from the controller to the pattern generator to instruct the pattern generator which of a set of canned tests stored in the pattern generator to execute. This approach saves on routing area at the expense of the area necessary to create individual pattern generators to test a plurality of memories.
While parallel testing of embedded memories is desirable from a speed standpoint, different embedded memories (e.g. data cache RAM and the associated tag cache RAM) in an integrated circuit are often not of the same size. If two memories of different sizes are tested by being written with the same data pattern, the data in the smaller memory will be overwritten starting with the lower order address space with the data intended to fill the remaining space in the larger memory if the process of writing to the smaller memory is not inhibited when its address space is exceeded. This situation could easily result in incorrect test results for the smaller memory.
One approach that has been proposed to solve this problem is to use the state of the higher order addresses to inhibit the write signal to the smaller memory, which can be efficient in a few special cases. For example, if one memory is smaller in the row direction and the size of the row address space of the smaller memory is a binary multiple (e.g. 2.sup.k) of the larger array, OR'ing the higher order row addresses that are unused in the smaller memory provides a simple means of generating the needed inhibit signal. However, for the more general case that the smaller array is of an arbitrary size that is not a binary multiple of the larger array, a magnitude comparator is required which becomes prohibitively complex for larger address spaces and consequently consumes an unacceptably large chip area.
In some types of memory there are important differences between row and column addresses. For example, DRAMs sense a complete row at once. Therefore it is common that the access time for address transitions among the column addresses within the same row are much faster than the access time for address transitions that involve selection of differing rows. Similarly, certain types of nonvolatile memories have the capability to write to a page at a time wherein the page lies along the same row. Because of this capability, the write timings for transitions along a row may be quite different than those for transitions from one row to another.
Despite the differences that may occur in row and column timings, it has been common practice in memory BIST to treat the address space as an undifferentiated whole with no distinction made between row and column addresses. Thus a single counter is used for the address generation for the cases in which the addresses are generated locally. This may be because of the implicit assumption that the embedded memories are SRAMs which are often designed to exhibit little difference between row and column address transitions in both the read and write modes. Although SRAMs are probably the most common type of embedded memory, use of nonvolatile memories and DRAMs as embedded memory is becoming more common.
Therefore, a need exists for a technique that allows the row and column addresses to be varied in a controlled manner so that, for example, the entire column address space could be accessed each time there was a transition in the row address space. Moreover, because all of the embedded memories may not have the same organization of rows and columns, a need exists for the technique to allow each embedded memory on an integrated circuit to be tested according to a different pattern of row and column addresses if necessary.