The present invention relates, generally, to the field of Design Automation, and more specifically, to the design of complex combinatorial and sequential logic circuits embodied in very large scale integration (VLSI) circuit devices.
Faults occurring anywhere in VLSI circuit devices can have their effect propagated through a number of feedback loops formed of storage or memory elements in the sequential logic before reaching a test-able output of the device. Level Sensitive Scan Design (LSSD) rules were devised to eliminate complications in testing caused by this propagation through the feedback loops. As described by E. B. Eichelberger and T. W. Williams in an article entitled “A Logic Design Structure for LSI Testability” on pages 462–468 of the Proceedings of the 14th Design Automation Conf. LSSD rules impose a clocked structure on the memory elements of logic circuits and require that these memory elements be interconnected to form a shift register scan path so that the memory elements are all accessible for use as both a test input or output point. Therefore, with the scan path, test input signals can be introduced and test results observed wherever one of the memory elements occurs in the logic circuit. Being able to enter the logic circuit at any memory element for introducing test signals or observing test results, allows the combinational and sequential logic to be treated as much simpler combinational logic for testing purposes, thus considerably simplifying test pattern generation and analysis.
Single or multiple scan paths can be provided under the LSSD rules. Practitioners of the art will readily recognize that control means can be provided for LSSD scan circuits to switch between single or multiple path modes of operation.
LSSD has now become the industry standard scan design methodology. It uses separate test clocks (vs. functional clocks) for scanning data in or out of latches for test and debugging purposes. This technique is used widely across the industry for many microprocessor designs, as well as in ASICs. A segment of a conventional LSSD register is illustrated in FIG. 1 for purposes of discussion. Referring to FIG. 1, the sample LSSD register segment is seen to be composed of master (100, 101) and slave (200, 201) latches, with two input ports to the master. In the scheme shown, when the system clock is activated, system data is clocked in and out of the master latches (100, 101). This is typical of normal “functional mode” or “system mode”. However, with the system clock input in an inactive state, if Shift A clock is activated, then the scan_data_out from a previous slave latch in the register is clocked in and out of the master latches. Shift B clock is energized after Shift A clock becomes inactive, clocking the data in and out of the slave latches (200, 201). By successively activating Shift A clock and Shift B clock in this fashion, data can be moved along through the register, step by step, in an orderly manner. There are many types and forms of LSSD latches used in the industry. The exact configuration of the LSSD latch is not important to this discussion, but it is important to understand how the LSSD latches and registers are used in typical LSSD designs.
The principle of operation of an LSSD register is as follows. In a normal “functional mode”, also referred to “system mode”, data is clocked in and out of the latch, as described above, and is synchronized by a high speed clock (system clock) which is distributed to all the latches across the design. For test and debug purposes, it is desirable to stop the system clock and read out the data stored in the registers to ensure that the system is functioning in the correct manner. It is also desirable to load specific data values into the registers, so that when the system clock is restarted, known data values will be presented to the system logic receiving inputs from the registers. In an LSSD system, once the system clock has been deactivated either locally, by way of controls at the individual registers, or globally (switched off for the entire system), both of these capabilities can be exercised. The data in all the LSSD registers is read out by way of successive Shift A and Shift B clock pulses as described above. For instance, after stopping the system clock, a Shift B clock pulse shifts all the system data captured in the latches to the scan_data_out port. It is evident that Shift A must remain inactive while Shift B is activated; otherwise data will “race” down the chain in an uncontrollable fashion. Then, a Shift A pulse takes the data to the scan_data_out port and clocks the data into the next latch of the register (i.e., Shift B clock is inactive while Shift A is active). In this manner, data is shifted serially through the register by successive clock pulses. By monitoring the final scan_data_out port of the register, it is possible to observe the data originally captured in each latch, as each successive Shift B clock pulse clocks a new data value to the output of the register. Furthermore, it is equally possible to load new data values into the LSSD register in the same fashion, starting from the first scan_data_in input of the entire register. Applying the desired data value to the first scan_data_in input of the register, the first Shift A captures the first bit of the data applied to the first latch of the register and a subsequent Shift B clock shifts the data to the scan_data_out port of the first latch. Then, a second data value is provided to the scan_data_in input, and the second Shift A loads this second data value into the first latch while the previous data is now loaded into the second latch of the register. This process continues until the desired data has been written into all the latches in the register.
Following the above discussion, it can be seen that LSSD registers can be of any size, or multiple registers can be strung together in series, with the final scan_data_out of one register connected to the first scan_data_in port of the next. In this manner, in an LSSD system, all the latches across the entire design are made observable and configurable, so that all the logic can now be tested and observed. This is shown schematically in FIG. 2, illustrating a conventional LSSD design.
For an LSSD configuration, a single stuck-fault model is used to generate test patterns applied to the circuit, and output responses are collected after each test for comparison with the precalculated “good circuit” responses. It has been shown that such stuck-fault test generation is one of a class of difficult mathematical problems called NP-complete, where NP stands for non-deterministic polynomial time, and complete meaning that a solution for one problem in the class can be extended to all. In all NP-complete problems, the number of possible solutions grows dramatically as the size of the problem increases. Therefore, the implication is that test generation computer time increases exponentially with the size of the circuit. In view of the foregoing, it appears that the best stuck fault test algorithms are only computationally feasible for fairly small or fairly simple networks and fault-oriented approaches become prohibitively expensive with the increasing circuit density of VLSI chips and modules.
In view of the foregoing, it has been suggested that self-testing be employed in connection with LSSD to reduce the time required for generating test patterns and for performing the actual test. Self-testing involves the use of pseudo-random pattern generators and response compression structures that are built into logic circuit devices. Using such pattern generators and compression, these structures eliminate the computer time required to generate the tests while placing these testing elements on the device containing the logic. Thus, the application of vast numbers of test patterns to the circuits in a reasonable period of time becomes possible.
Referring back to FIG. 2, a test control unit or TCU (100) (Test Control Unit) is shown providing the signals and the functionality needed for LBIST (Logic Built-In Self Test) to test the logic throughout the chip. The TCU provides non-overlapping Shift A and Shift B clock pulses that are distributed to all the LSSD latches and registers (300 through 303) on the chip. In addition, the TCU provides serial data which is loaded into the registers for test purposes and receives the serial scan_out data from the string of registers. This process makes it possible to analyze the data for test purposes (i.e., to ensure that the data received matches expectations). The combinatorial logic (400) is therefore tested by initially loading all the LSSD registers with known values from the TCU, running some number of system clock cycles, reading out all the register data, and comparing it in some fashion to the expected data. FIG. 2 also illustrates a “bypass” mode. If there is a problem with the TCU, or if for some reason it is suspected that the TCU is not operating correctly, it may be desirable to provide independent Shift A and Shift B clock inputs from an off-chip source (i.e., through a primary input). This is accomplished by way of multiplexer (200) which allows the selection of the Shift A and Shift B clock source. The scan_out data can also be sent to a chip output, to be observed directly. Further, it may also be desirable to have a separate scan_in input from off-chip as well.
Practitioners of the art will readily recognize that actual LSSD systems are highly complex, and considerably more than the illustrative example shown in FIG. 2. A chip (multi-chip module, board, frame and the like) may contain hundreds of thousands of LSSD latches typically divided into multiple strings of registers. Generally, registers provide inputs to and receive inputs from any given block of combinatorial logic. However, the basic principles illustrated in FIG. 2 remain unchanged.
There are several issues and problems when designing LSSD systems. As described previously, Shift A and Shift B clock signals must be distributed to every LSSD latch and register across the entire chip. This is shown schematically in FIG. 3. It is apparent that the characteristics of Shift A and Shift B clocks, received at, e.g., register 307, are quite different from those at register 300, since cross-chip wires typically have large delays and may require insertion of many buffering stages (not shown). These buffering stages must be added to maintain acceptable Shift A and Shift B clock waveforms at each latch. Typically, buffers are added to every major branching point in each distribution tree to isolate the capacitance of all the branches from the resistance of the long “trunk”. In addition, even straight-line wires are broken into discrete segments (perhaps 1 mm in length) to avoid excessive RC delay in the wire, poor signal slew, extreme sensitivity to coupled noise, and other disturbances on the chip. Given that actual chips have hundreds of thousands of LSSD latches, one may expect to encounter extremely complicated routing and buffering trees for the Shift A and Shift B clock signals. Furthermore, one must ensure that during scan testing, the Shift A and Shift B clocks never overlap anywhere (i.e., are never both on at the same time). Furthermore, on every Shift A and Shift B clock cycle, each clock individually must remain high long enough to cause new data to be written into master or slave sections of all the latches.
Now, as microprocessor (and ASIC chip) frequencies continue to increase, technology scaling provides ever higher circuit densities and chips continue to increase in size, it is becoming increasingly difficult to distribute the Shift A and Shift B clocks at very high frequencies, given the design constraints described previously. There are a number of fundamental reasons why this is so.
First, in order to continue making improvements to the operating frequency of these chips, a common technique is to divide the logic up into finer and finer “slices”, each slice preferably separated from the next by a latch element. In this way, the overall logic depth between any two latches decreases, reducing the time required for propagation from the latch through the logic, and then into the capturing latch. This reduces the propagation delay, and translates it into a higher operating frequency. However, it is apparent that this technique leads to a proliferation in the number of latches needed to design the chip. Since the test clocks have to be distributed to each latch, increasing the number and density of latches on the chip in the manner described above leads to an increase in the complexity of the test clock distribution, causing the overall propagation time (and associated timing uncertainties) to increase as well. It is evident that the larger uncertainties associated with these larger delays implies that the Shift A and Shift B clocks have to be spaced further apart to ensure non-overlap, and clock pulses must be made wider at the source to ensure that they remain sufficiently wide at all the sinks (i.e., LSSD latches). These requirements effectively limit the frequency of the test clocks. In addition, as the chip frequency increase when segmenting the logic into ever finer slices, it is apparent that this segmentation will not help to speed up the distribution of the test clocks (it actually hinders it, as described above, since the number of latches increases). Thus, the imbalance between the microprocessor frequency and test clock frequency tends to increase when this technique is utilized.
The second reason is related to fundament VLSI scaling principles. As technology scaling continues, both wire widths and heights are reduced according to the technology scale factor, implying that the wire cross-sectional area decreases proportionately to the square of the lithography length scale. This means that the resistance per unit length of wire increases as the inverse of the square of the lithographic dimension. The capacitance per unit length tends to stay about constant since the decreasing space between adjacent wires is countered by the decreasing profile of the wires. Therefore, even if the chip size remains constant as the lithographic dimension decreases, the RC delay through the wires increases rapidly, necessitating the insertion of larger numbers of buffering elements to maintain acceptable clock waveforms and slews. It is observed that the propagation delay through the test clock network tends to be magnified as a result of these scaling principles, and the larger uncertainties associated with these larger delays again limits the frequency of the test clocks.
Finally, technology scaling has led to an ever greater density in the number of circuits (including latches) on the chip. Thus, in addition to the two factors described above, the increase in the number and density of latches on the chip greatly increases the difficulty of the test clock distribution problem. Again, more buffering stages need to be added as the distribution tree is made ever more complex. This extra delay leads to an additional uncertainty and disparity in the clock arrival times at the latches, which again will have a negative impact on the maximum test clock frequency.
The above implies that relatively large intervals must separate the Shift A from the Shift B clock cycles, and each test clock must remain active for a longer period of time to ensure propagation of a sufficiently wide pulse. Typically, the test clock frequency is specified as some fraction of the functional clock frequency. A typical ratio of functional to test clock frequencies is of the order of 16:1. Therefore, if a chip were designed to run at 4.8 GHz, test clocks will typically run at 300 MHz.
Accordingly, the conventional scan clock test methodology of this type is not ideal in various ways. First, LBIST time is relatively long because the rate at which data is scanned through all the LSSD registers is limited by the slow test clocks. Second, since scanning occurs at a low frequency, power is relatively low during scanning, but the chip will see large power transients when the functional clocks are fired during LBIST testing. Third, the overall timing relationships between the Shift A and Shift B clock distributions and the functional clock, and the test control signal distribution are unknown and vary across the chip. Therefore, a gap needs to be provided between an operation in test mode (i.e., with Shift A and Shift B clock toggling), and an operation in functional mode with functional clocks (e.g., for LBIST testing). This gap further exacerbates power transients during test, and limits the testability or test strategy in certain ways. Finally, since Shift A and Shift B clocks run at different frequencies from the functional clocks, a separate timing verification process is usually needed, and possibly even a separate set of timing models needs to be generated, all of which requires considerable additional work for the design team.
A known solution to the problem described above is the so-called GSD (general scan design) methodology. This methodology, as commonly implemented, uses the functional clock for all the clocking (both for scanning and functional operations). The mode of operation is set by control signals that select which data to write into the latch (either functional, system data, or test data). This technique solves the problems mentioned above, but introduces new issues of its own. First, there is no capability for driving separate test clocks into the chip from external inputs (since there is no separate test clock any longer). Second, it implies that a separate Mux (multiplexer) needs to be inserted in front of each scannable latch to allow choosing between scan data vs. functional data. This increases the delay and/or the power consumption, and possibly the area. The Mux can be merged into a latch as a separate data port, in which case, a scan clock needs to be generated locally from the functional clock (with a select control to choose which clock to fire). In such instance, the scan data ports on the latches must be sufficiently large to guarantee writing the scan data with a single test clock pulse, even for a worst case duty cycle (i.e., minimum clock pulse width). If the scan port on the latch is too slow, it limits the frequency of the AC test (since the test clock is generated locally from the functional clock, it always runs at the same frequency as the functional system clock). In addition, for a typical master-slave flip-flop, three clocks need to be generated from the system clock (c1 and c2 for functional purposes, and a “scan c1” for clocking the scan port). This third clock increases the loading on the mesh by approximately 50% when compared to the LSSD architecture, all other constraints remaining the same (LSSD requires only the two functional clocks to be generated from the system clock), and also increase the power consumed, depending on the configuration used to select between c1 and the scan c1 clocks. Further, the GSD arrangement generally requires two at-speed AC test control signals to be distributed across the chip, compared to only one, for most LSSD applications. Furthermore, if any of the control signals is slower than expected or slower than the particular logic or does not meet the full frequency timing requirement anywhere in its distribution network, the frequency at which the chip is to be tested will be limited.