This invention is in the field of integrated circuit testing. Embodiments are directed to the testing of embedded memories in large-scale integrated circuits.
Many modern electronic integrated circuits integrate essentially all necessary functional components of a computer system, whether general purpose or arranged for a particular end application. Those large scale integrated circuits that include the computational capability for controlling and managing a wide range of functions and useful applications are often referred to as “system on a chip”, or “SoC”, devices. Typical modern SoC architectures include one or more processor “cores” that carry out the digital computer functions of retrieving executable instructions from memory, performing arithmetic and logical operations on digital data retrieved from memory, and storing the results of those operations in memory. Other digital, analog, mixed-signal, or even RF functions may also be integrated into the SoC for acquiring and outputting the data processed by the processor cores. In any case, considering the large amount of digital data often involved in performing the complex functions of these modern devices, significant solid-state memory capacity is now commonly implemented in these SoC devices.
In order to optimize performance, memory resources are typically distributed throughout the typical modern SoC device. These memory resources can include both volatile and non-volatile memory. This distributed memory architecture results in memory resources being physically and electrically (or logically) proximate to the processing function that will be accessing it, but may be physically and logically remote from other similar memory of the same type. For example, the deployment of local memory resources will minimize the traffic over the system bus, which reduces the likelihood of bus contention and undesirable latency, and also reduces access time and memory management overhead. The number of memory arrays realized throughout a modern large-scale SoC devices can be quite large, numbering into the hundreds in some cases.
It is of course important to fully test the functionality and performance of integrated circuits at the time of manufacture, especially considering that memory resources can occupy much of the chip area of a typical modern SoC. As known in the art, conventional memory test algorithms can be quite time-consuming, particularly those involving test patterns of order O(nx) where x is greater than one, and as such the test time and test cost involved can be dominated by memory test. The distribution of embedded memory resources throughout typical SoC devices further complicates the task of memory test, as many memory arrays are not directly accessible to external automated test equipment yet must still be tested.
As known in the art, SoC devices typically include internal test circuitry (“built-in self-test”, or “BIST”, circuitry) that executes a self-test operation for the device upon power-up or reset. BIST may also be involved in the testing of memory, both at the time of manufacture and also on power-up. Conventional BIST memory test techniques can include the placement of hardwired logic in the SoC, by way of which memory test algorithms developed at the time of circuit design are implemented; however, it may not be feasible to determine the particular tests to be performed at that early stage of the process. Another conventional BIST approach is to use the central processing unit of the SoC itself to perform the memory test. This approach can be limited, however, because not all embedded memory arrays in the device may be visible to the CPU, and are thus not testable by the CPU. Direct memory access (DMA) techniques for providing external access to embedded memories are also known, but typically are unable to access the memory at its full operating speed.
Because of these limitations, programmable BIST (“pBIST”) techniques have been developed to test embedded memories in the SoC context. U.S. Pat. No. 7,324,392 and U.S. Patent Application Publication No. US 2014/0164856, both commonly assigned herewith and incorporated herein by reference, describe examples of these pBIST techniques for testing embedded memories in large-scale integrated circuits such as SoC devices. According to these approaches, the pBIST circuitry includes a general purpose test controller that is programmed by a set of instructions to produce test conditions for the various internal and embedded functions of the device, and to receive and log the responses of those functions to those test conditions. In the memory test context, these operations include the writing of the desired data pattern to an embedded memory, and then addressing the memory to retrieve and compare the stored data to the expected data. Typically, the BIST data path over which the data are communicated during memory test is a separate and independent data path from that by which the embedded memories are accessed in normal operation of the SoC.
Because of the high test time and test cost for testing the memory capacity of the SoC device, as discussed above, BIST techniques have been developed for the parallel testing of embedded memories, such that multiple memory arrays are simultaneously tested. According to one conventional approach, this parallel test is implemented by instantiating multiple BIST controllers that simultaneously execute a test of an associated embedded memory. Of course, the provision of multiple BIST controllers multiplies the chip area required for the BIST test logic and data paths, forcing a trade-off between chip area and test time.
Conventional pBIST architectures, such as described in the above-incorporated U.S. Pat. No. 7,324,392, include a BIST controller that is shared by multiple memories of similar memory type (e.g., single-port, double-port, etc.). The shared BIST controller generates the test pattern to be written to the memories, and also the expected response from the memories when read. Each memory has a local comparator that compares the data read from its memory during the test with the expected data from the shared BIST controller, and forwards the results to the shared BIST controller. In order for the expected data from the shared BIST controller to align with the data read from the parallel embedded memories, this conventional arrangement includes a local response delay generator that aligns the expected data to account for access latency for that particular memory, and a local comparator that compares the delayed expected data with the data read from that particular memory and generates a pass/fail signature accordingly.
FIG. 1 illustrates an example of the architecture of a BIST memory test data path in a conventional SoC, in which shared BIST controller 10 supports the parallel test of memories 15 in a manner such as described in the above-incorporated U.S. Pat. No. 7,324,392. This test data path is separate and independent from the data path by way of which memories 15 are accessed in normal operation, which is not shown in FIG. 1 for the sake of clarity. As shown in this example, BIST controller 10 communicates with each memory 15 by way of one or more pipeline delay stages 12, in combination with an instance of local response delay generator 14 that is dedicated to that embedded memory 15. BIST controller 10 may be one of multiple such BIST controllers within the SoC. In architectures such as this example, a given BIST controller 10 is typically associated with memories 15 that are of a common type (e.g., single-port, double-port), considering that BIST controller 10 generates the particular test data pattern to be applied to its associated memories 15; as such, if the SoC includes multiple memory types, multiple BIST controllers 10 and associated data paths may be present. The data pattern generated by BIST controller 10 is applied directly to memories 15, after passing through the pipeline delay stages 12, but these data are not delayed by local response delay generators 14.
In this arrangement, pipeline delays 12 and each local response delay generator 14 delay the expected data response communicated from BIST controller 10 before application to the instance of local comparator 16 with which that local response delay generator 14 is associated. Local comparator 16 compares that delayed expected data response with the data read from its associated memory 15 during the memory test, and generates a pass/fail signature based on the results of that comparison. In this example, the pass/fail signatures generated by comparators 16 are communicated back to BIST controller 10, for example by way of parallel test data comparator 17 function, which produces an overall pass/fail signature for those memories 15 that were tested in parallel.
In this conventional architecture, instances of pipeline delays 12 may be shared by embedded memories 15 that are generally in the vicinity of one another. For example, pipeline delay 120 is shared by all embedded memories 15 shown in FIG. 1, while pipeline delay 121 is shared by embedded memories 15 of group 181 that are in the general vicinity of one another, and pipeline delay 122 is shared by embedded memories 15 in group 182 that are in the general vicinity of one another. Each of pipeline delays 12 essentially operate as one or more clocked buffer stages for the data communicated by BIST controller 10, such that a data word applied at the input of an instance of pipeline delay 12 will appear at its output after a delay of x clock cycles, where x is the number of buffer stages in that pipeline delay 12. Each local response delay generator 14 is similarly constructed, and operates to delay the expected data it receives by one or more clock cycles, so as to align it with the memory access latency of its associated embedded memory 15.
While the pipeline architecture in this conventional arrangement is “physically aware” by sharing pipeline stages 12 based on the general physical proximity of embedded memories 15, dedicated local response delay generators 14 must still be provided in this architecture. These dedicated local response delay generators 14 can each occupy significant chip area, especially in the case of very wide data words (e.g., up to 128 bit) that are now often required in many modern SoC devices. In some cases, particularly those in which the overall chip area of the SoC is constrained by packaging considerations and other constraints, the chip area consumed by these dedicated local response delay generators can be prohibitive, such that parallel memory test cannot be implemented.