Field
The technology relates to electronic design automation in the integrated circuit industry. Various embodiments relate to test data volume and test application time reduction during testing of integrated circuits and more particularly for decompressing test stimuli and minimizing toggling rate in the decompressed test stimuli.
Description of Related Art
Electronic design automation (EDA) is applied in the semiconductor industry for virtually all design projects. After an idea for the product is developed, EDA tools are used to define a specific implementation including lithographic masks for production of the finished chips, in a process referred to as tape-out. The lithographic masks are then used with fabrication equipment to manufacture integrated circuit wafers. Testing and diagnosis are required steps to determine defective dies and defect localization. Next, physical failure analysis is performed to identify root causes for systematic defects which are used for correction of masks, and design and fabrication process improvements in order to increase yield. Finally, the wafers are diced, packaged and assembled to provide integrated circuit chips for distribution.
An exemplary procedure for using EDA tools begins with a design specification of a product to be implemented using the integrated circuit. Next, logic design tools are applied to create a high level description based on description languages such as Verilog or VHDL, and functional verification tools are applied in an iterative process to assure that the high-level description accomplishes the design specification. Next, synthesis and design-for-test tools are used to translate the high-level description to a netlist, optimize the netlist for target technology, and insert test logic that permits testing of the finished chips.
A typical design flow might next include a design planning stage, in which an overall floor plan for the chip is constructed and analyzed to ensure that timing parameters for the netlist can be achieved at a high level. Next, the netlist may be rigorously checked for compliance with timing constraints and with the functional definitions defined at a high level using VHDL or Verilog. After an iterative process to settle on a netlist and map the netlist to a cell library for the final design, a physical implementation tool is used for placement and routing. A tool performing placement positions circuit elements on the layout, and a tool performing routing defines interconnects for the circuit elements.
The components defined after placement and routing are usually then analyzed at the transistor level using an extraction tool, and verified to ensure that the circuit function is achieved and timing constraints are met. The placement and routing process can be revisited as needed in an iterative fashion. Next, the design is subjected to physical verification procedures, such as design rule checking DRC, layout rule checking LRC and layout versus schematic LVS checking, that analyze manufacturability, electrical performance, lithographic parameters, and circuit correctness.
After closure on an acceptable design by iteration through design and verify procedures, like those described above, the resulting design can be subjected to resolution enhancement techniques that provide geometric manipulations of the layout to improve manufacturability. Finally, the mask data is prepared and taped out for use in producing finished products.
This design process with EDA tools includes circuitry that allows the finished product to be tested. Efficient testing of integrated circuits often uses structured design for testability (DFT) techniques. In particular, these techniques are based on the general concepts of making all or some memory elements like flip-flops and latches in the circuit under test (CUT) directly controllable and observable. The most-often used DFT methodology is based on scan chains. This approach assumes that during testing all (or almost all) memory elements are included in shift registers. As a result, the designed logic circuit has two (or more) modes of operation, including at least a functional mode and a test mode. In the functional mode, the memory elements perform their regular functions. In the test mode, the memory elements become scan cells that are connected to form shift registers called scan chains. These scan chains are used to scan-in test stimuli into a CUT and scan-out test responses. Applying a test pattern consists of performing scan-in (loading) the test stimulus, applying one or more capture clocks, and then performing scan-out (unloading) the captured test response. The test responses are then compared to fault-free test responses to determine whether the CUT works properly.
The DFT methodology has been widely used in order to simplify testing and diagnosis. From the point of view of automatic test pattern generation (ATPG), a CUT can be treated as a combinational or partially combinational circuit. Today, ATPG software tools are able to generate a set of test patterns based on different fault models including stuck-at, transition, path delay, and bridging faults. When a particular fault in a CUT is targeted by an ATPG tool, only a small number of scan cells (typically less than 1 percent) is set in particular values (called specified care bits) and one scan cell (an observable point) is observed in order to detect this fault wherein the specified care bits are required to sensitize this fault and propagate the fault effect to the selected observable point. A common approach for test data volume reduction (TDVR) and test application time reduction (TATR) is to use compressed test data rather than storing the entire test stimulus and the entire test response in the tester. A block diagram of an integrated circuit having an on-chip test data compression capability is shown in FIG. 1. Accordingly, a tester is coupled to an integrated circuit comprising a CUT, a decompressor and a compressor. In addition, the CUT may have one or more cores such that each core has an individual decompressor and compressor. Characteristics of decompressor and compressor schemes as well as the routing compressed test data from and to the tester have a major impact on the level of test data compression. Hereafter, the discussion is focused on the decompressor scheme and an encoding process for mapping the specified care bits during ATPG into a pattern such that bits of a decompressed test stimulus which are loaded into the CUT contains all specified care bits.
Prospective decompressor schemes are summarized in FIG. 2. Accordingly, decompression schemes are classified as combinational FIG. 2(a), sequential with limited sequential depth FIG. 2(b) and sequential FIG. 2(c). Combinational decompressors include a combinational block typically comprising XOR and NXOR gates (for linear decompressors) and MUX gates (for non-linear decompressors) such that decompressed test stimulus loaded into scan chains are calculated as a logic function of one or more streaming tester channels. The combinational decompressors have a simple hardware that often support a dynamic encoding wherein the encoding process is incorporated into the ATPG implication process. A challenge for the combinational decompressors is that they encode all specified care bits in one shift cycle using only variables from the tester which are dedicated for this shift cycle. The worst-case, most highly specified shift cycles tend to limit the level of test data compression because when the number of scan chains increases then the number of variables per shift cycle is sufficiently large to encode the most highly specified shift cycles.
Sequential decompressors are usually a linear finite-state machine including one or more shift registers, LFSRs, cellular automata and ring generators. Sequential decompressors allow variables from current and earlier shift cycles to be used for encoding care bits in the current shift cycle. As a result, the sequential decompressors provide more diverse of an output space with less decompressor-imposed constraints than the combinational decompressors. In particular, linear decompressors generate a test sequence comprising a set of specified care bits {c0,c1, . . . ,cm−1} which is called also a test cube C if and only if a system of linear equations AV=C has a solution where A is a n×m characteristic matrix of the linear decompressor and V={v0,v1, . . . ,vn−1} is a set of variables from the tester. The characteristic matrix for a linear decompressor is derived by symbolic simulation of the linear decompressor such that each symbol represents one variable. The encoding process for sequential decompressors typically requires solving a system of linear equations including one equation per specified care bit. More formally, the characteristic matrix is a binary matrix (comprising only 1s and 0s) such that each row corresponds to a care bit and each column corresponds to a variable from the tester. The entry in row i and column j in the characteristic matrix has the value 1 if and only if the i-th care bit depends on the j-th variable. After Gauss-Jordan elimination, all linearly independent rows are found. A solution exists if a superposition of sets of linearly dependent rows is equal to 0. If a solution does not exist then the test cube is unencodable. Clearly, it is unlikely to encode test cubes having more specified care bits than the number of available variables from the tester. However, if the number of variables is sufficiently larger than the number of specified care bits then the probability of not finding a solution (or having an encoding conflict) is negligible. The computational complexity of the described encoding process is O(nm2). As a result, the sequential linear decompressor schemes need to use a static encoding wherein test cubes are first generated, then checked for compatibility and finally encoded. In contrast, the simple decompressor schemes use a dynamic encoding wherein each specified care bit is immediately encoded during branch-and-bound search so that all encoding conflicts are identified and resolved during ATPG. In addition, the simple decompressor schemes allow extracting most of all necessary assignments or implications for a particular test cube (set of care bits) based on the decompressor-imposed constraints that allow an efficient pruning of the branch-and-bound search space.
Routing test data between the tester and a sequential decompressor scheme is based on static or dynamic reseeding. Decompressors based on dynamic reseeding typically receive one seed (or set of static variables having the same scope) per test pattern plus one or more dynamic variables per shift cycle via streaming tester channels wherein both static and dynamic variables are mixed together in the decompressor scheme in order to maximize the encoding flexibility. From a tester standpoint, the dynamic reseeding provides an elegant solution and avoids the need for any special scheduling and synchronization. Decompressor schemes based on dynamic reseeding typically receive a fixed number of test data bits per test pattern which is determined such that both test coverage loss and test pattern inflation are minimized with respect to the conventional scan mode. A challenge for the decompressor schemes using the dynamic reseeding is to maximize TDVR since fewer care bits are required at the end of test pattern set. In contrast, decompressor schemes based on static reseeding typically use multiple seeds per test pattern and seed overlapping. They can selectively encode as many care bits as needed while maintaining a high encoding efficiency (i.e. a ratio of successfully encoded care bits to the deployed bits from the tester). A challenge for the decompressor schemes using the static reseeding is to minimize time overhead because reseeding may delay shift operations.
In addition, the decompressor schemes can use a combination of test data, control data and correlations. In particular, the encoding efficiency of the static and dynamic reseeding cannot exceed the value of 1. The encoding efficiency can be increased above this value using a clustering approach based on the fact that many faults require similar but incompatible test cubes. More formally, test cube clustering uses three test sequences: a parent test sequence and a control test sequence for a cluster of test patterns as well as an incremental test sequence for each test pattern. Accordingly, the test cubes are divided into clusters such that the number of compatible care bits in the test cubes of each cluster is maximized. Next, the parent test sequence is responsible for encoding compatible care bits for each cluster, the incremental test sequence is responsible for encoding the remaining incompatible care bits for each test pattern and the control test sequence determines which test sequence (incremental or parent) is used for a particular care bit and cluster of test patterns. The encoding efficiency of the test cube clustering is improved based on the fact that the control and parent test sequences are the same (or valid) for a cluster of test patterns. A challenge for the decompressor schemes based on the test cube clustering is to reduce the bandwidth between the decompressor and the tester because the parent and control test sequences need to be repeated for each test pattern in a cluster.
Serial shift registers are used to transform serial test data from the tester into parallel test data (or seeds) for the decompressor scheme based on the following facts: 1) tester speed is typically higher than the speed of the scan-in shift operation; and 2) the test costs are estimated by the number of required streaming tester channels per die. As a result, bandwidth of a streaming tester channel is typically improved by a factor of 4, 6 or 8 test data bits per shift cycle.
Last but not least, the decompressor schemes need to minimize the scan-in switching rate during a shift operation in order to reduce power dissipation in the test mode. High power dissipation during test may result in either overheating or supply voltage noise—either of which can cause a device malfunction leading to loss of yield, reliability degradation, or even permanent damage of the CUT. A peak in scan-in switching is estimated by the toggling rate in the decompressed test stimuli during the last shift cycle. A more accurate estimate of the scan-in switching is based on a weighted transition metric (WTM) which takes into account the number of invoked transitions in decompressed test bits which are loaded in successive scan cells and their relative positions. In FIGS. 2(d) and 2(e), a shadow register is added between either a tester or decompressor scheme and scan chains to reduce toggling in the decompressed test stimuli. Particularly, the shadow register receives a control sequence that allows loading the same decompressed test bits in consecutive shift cycles. Also, a shadow register may have multiple segments such that each segment may selectively receive decompressed test bits from the decompressor schemes.
In summary, an advantage of the complex decompressors is high encoding efficiency and a more diverse output space with fewer decompressor-imposed constraints. An advantage of the simple decompressors is dynamic encoding wherein the encoding process is incorporated into the ATPG implication process to exploit this degree of freedom. As a result, the probability of encoding a particular test cube with a complex decompressor is higher because it has a more diverse output space than simple decompressor schemes. However, the fact that most faults can be detected by many different test cubes provides an additional degree of freedom during ATPG for aggressive test data compression and efficient dynamic test pattern compaction. In addition, desirable properties include a simple and flexible interface for routing test data from the tester to multiple units of decompressor circuitry as well as a simple and flexible mechanism to reduce the toggling rate in the decompressed test stimuli.