With some integrated circuits growing to include billions of transitions, it is virtually impossible to design them flat (with no partitioning). Electronic Design Automation (EDA) tools would not be able to process them efficiently. Additionally, there is significant reuse of Intellectual Property (IP) from one design to another. Large designs, known as Systems-On-A-Chip (SOCs), include a large number of “cores” that are used as building blocks (also referred to circuit blocks). Each core is usually designed and validated individually first, then integrated with other cores to form the entire SOC. This is known as hierarchical design. Ideally, as much of the design and validation work is done at the core level, which is smaller and more manageable, leaving the integration and chip-level validation to be done at the top level. All work done at the core level can also be done earlier when working on the core, and moved out of the critical path when the SOC comes together closer to the tape-out deadline.
As designs have grown, the number of levels of core hierarchy has grown as well. Hierarchical design started with two levels of hierarchy: The core level and the chip/top level. Increasingly, cores are first integrated into larger sized cores or sub-systems, then integrated into the chip. This represents three levels of core hierarchy. Some large designs can have even more levels of core hierarchy.
Just as design adopted hierarchical design to manage complexity, so has scan test. In hierarchical test methodologies, the scan chains and compression logic are inserted into every core. The test patterns are generated and validated at the core level to test most of the logic in the core. Subsequently, the patterns from multiple cores are retargeted or mapped to the top level. They are also merged with retargeted patterns for other cores that will be tested at the same time. In addition to retargeting patterns generated for testing most of the content of each core, test pattern generation is also run at the next level up to test peripheral logic between the cores as well as logic at that level that is involved in integrating the cores. If this higher level is not the chip level, then those patterns will also have to be retargeted to the chip level.
The same test pattern generation and retargeting methodology is applied recursively regardless of the levels of hierarchy, but the planning and design of design for test (DFT) gets more complex with additional levels of hierarchy when using conventional scan access methods.
There are several challenges in planning and implementing hierarchical scan test in SOCs, most related to providing access to the scan channels in the cores. A scan channel is a channel connecting to inputs/outputs of scan chains, inputs/outputs of test controllers for test compression, or a combination thereof. When retargeting and merging core-level patterns to the top level, usually a subset of cores are tested at any given time due to two reasons: First, the power dissipation may not allow all cores to be tested concurrently; and second, the number of chip-level Inputs/Outputs (I/Os, or ports) does not allow all core-level channels to be accessed simultaneously.
For any group of cores that are to be tested concurrently, their channel inputs and outputs need to be connected to different chip-level I/Os when employing the conventional point-to-point scan access methods (sometimes referred to as star or switch topologies). Since there are usually more core-level channels that chip-level I/Os available for scan, the pin availability limits the number of cores that can be tested concurrently, and increases the number of groups (test sessions). Each top-level I/O can connect to a different core-level pin in each group. With time, the number of cores is growing, and the number of chip-level I/Os available for scan test is diminishing, such that fewer and fewer cores can be accessed directly from chip-level I/Os and tested concurrently.
Part of the planning is to identify up-front groups of cores that will be tested concurrently and plan connectivity between chip-level I/Os and core-level channels for each static configuration. This results in sub-optimal results since it creates fixed core groupings, often before the cores are available so that their test pattern counts can be estimated. In addition, the number of channels needed by each core can only be optimized after the core is available and Automatic Test Pattern Generation (ATPG) can be performed. However, this is only available late in the design cycle, and the number of core-level channels affects the static core grouping and planning of connectivity. All this leads to dependencies between the core level design and the top level design, and leads to sub-optimal decisions having to be made early on. As the number of levels of core hierarchy increases, the complexity multiplies.
Adding to the complexity are physical implementation (layout) considerations. Connecting multiple cores to each I/O can lead to routing congestion. The I/Os can also be embedded inside cores when using flip-chip technology. So the connections for one core impact the design of other cores to which the signals have to be connected, or through which the scan connections flow.
FIG. 1A illustrates an example of a circuit 100. The circuit 100 has five cores (circuit blocks) 110, 120, 130, 140 and 150. Among them, the cores 140 and 150 are the same core instantiated twice, known as identical core instances. General-Purpose I/O (GPIO) pads are commonly used for scan access at the chip level. The embedded deterministic test (EDT) blocks 115, 125, 135, 145, 155, 165, and 175, are where scan data are loaded and unloaded. There is scan logic in each of the cores, as well as at the chip level.
FIG. 1B illustrates one retargeting mode for testing the circuit 100. In this mode, the access between the chip-level I/Os (the GPIO pads) and cores 110, 120 and 130 are established. The EDT blocks 115, 125 and 135 are shown as being in the active mode for testing the cores 110, 120 and 130, respectively. FIG. 1C illustrates another retargeting mode for testing the circuit 100. Here, the identical core instances 140 and 150 can be tested as the scan channels in these two cores can be accessed through the GPIO pads. Here, the EDT blocks 145 and 155 are in the active mode. When different core-level scan channels connect to different I/Os in different groups, multiplexing needs to be added. The control of the multiplexers can be programmed statically once at the start of each test session.
FIG. 1D illustrates an example of the external test mode. In this mode, the logic at the top level plus the boundaries of the cores is tested. So at least the EDT controllers 165 and 175 at the top level need to be driven (in the active mode). Based on the scan architecture, some EDT controllers inside the cores (that drive the boundary scan chains in the cores) may also need to be accessed simultaneously.
A relatively recent trend in SOC design, referred to as tile-based layout, is adding further complexity and constraints to DFT architectures. In tile-based designs, virtually all logic and routing is done within the cores and not at the top level. The cores abut one another when integrated into the chip with connections flowing from one core to the next. Any connectivity between cores has to flow through cores that are between them. Logic that is logically at the top level has to be pushed into the cores and designed as part of the cores. FIG. 1E illustrates an example of a tile-based circuit and two retargeting modes. Compared to FIGS. 1B-1D, no logic or routing could occur at the top level in FIG. 1E and all logic and connections are pushed into the cores.
When retargeting core-level patterns, limited chip-level I/O counts may be dealt with by increasing the number of core groups, as long as there are enough I/Os to drive at least each core individually. However, there are cases where access to multiple cores simultaneously, including access to all cores simultaneously, is necessary and grouping cores into smaller groups is not an option.
In an ideal hierarchical test, the internals of each core are tested when retargetable patterns are generated for the core, and the periphery of the core plus logic at the next level up are tested when pattern generation is performed at the next level and the lower level cores are placed into their external test modes. However, there are cases where pins of a core cannot be wrapped to provide this isolation, and the only way to test connections to/from cores is to run ATPG on them simultaneously while wrapping is disabled. To cover such logic, it is often necessary to test groups of cores simultaneously. If testing 8 cores simultaneously, for example, then with the traditional mux-based access, there must be enough chip-level I/Os to drive the channels of the 8 cores concurrently.
There are also cases where all EDT blocks must be accessed concurrently. For example, IDDQ is a test where data is scanned into the scan chains, then the current used by the entire chip is measured. If it exceeds a threshold, that indicates a silicon defect. IDDQ is usually applied across the entire chip, so for optimal efficiency, all scan chains in the entire design need to be loaded with every IDDQ scan pattern. When using scan compression like EDT, that means there must be enough I/Os to drive all the EDT channels of the cores concurrently.
Ideally, the channel count requirements within the chip should be decoupled from the chip-level I/O counts such that fewer chip-level I/Os can drive an arbitrarily larger number of core-level channels.
Pattern retargeting to identical circuit blocks can benefit from the fact that all instances are identical. If the cores are wrapped and isolated from their surroundings, as usually done in a hierarchical test methodology, then the stimuli and responses of all core instances are identical. ATPG only needs to be run once to generate the patterns for that core regardless of the number of core instances. Since the inputs are identical, they can be broadcast to all core instances with pipelined inputs. This can save on the number of I/Os needed to test the cores concurrently. Challenges still remains however. Although broadcast can be used to keep the number of I/Os needed for channel inputs constant, the outputs typically still have to be observed independently to guarantee the same test coverage achieved at the core level and to ensure enough observability for diagnosing failing cores. If at least 1 output channel is needed per core instance, this limits the number of identical core instances that can be tested concurrently just as there are similar limitations on heterogeneous core instances.
Moreover, the capture cycles are usually synchronized and delivered from a common clock source. To align the capture cycles and shift sequence, while data is being broadcast, the same number of pipeline stages must be used for every core instance regardless of the proximity of the core to the pad driving it, as illustrated in FIG. 14. This is further complicated in the presence of a tiling methodology. With tiling, there can be no routing or logic outside the cores. So the cores must meet the pipelining and connectivity criteria previously mentioned, all while forming connections through abutment, and having multiple instances that are identical copies of the same module. This can be achieved by employing programmable pipelining and channel output routes, but it adds complexity and limits the reuse of cores since designing a new chip with more core instance requires redesigning the cores to account for differences in pipelining and routing channels.
A general packet-based core access architecture has been proposed. In this architecture, each parallel word includes the address of the core (or core group) the information is destined for, an opcode indicating what to do with that data, and the actual payload. This architecture can work for both heterogeneous and identical cores. For identical cores, it supports efficient broadcast of stimuli and expected values (good-machine responses), on-chip comparison, and accumulation of pass/fail data such that multiple identical cores could be tested in near constant time. This architecture, however, is not efficient because there is significant overhead in every parallel word. Information that is not the payload, namely the address and opcode, occupies certain number of bits. A very narrow bus would not be able to support this architecture.