Improvements in semiconductor process technology has enabled larger, more complex systems to be integrated onto a single integrated circuit chip. However, the number of physical wire interfaces to the chip are often limited, allowing only a fraction of internal signals to be visible from the chip's I/O. For example, when a large memory and a processor are integrated together as a system-on-a-chip (SOC), the address and data between the processor and the memory may not be visible on the chip's I/O. This makes debug and testing of the chip significantly more difficult, since an external logic analyzer can only observe the chip's I/O.
Internal nodes may be routed to a central block that copies data on the internal nodes to external I/O, such as through using scan chains or JTAG. However, the number of internal nodes that can be sampled is often small. Sampling may not be possible at full system speeds and may be quite awkward, necessitating time-consuming scan-out of data from serial chains.
It is usually not known in advance exactly which nodes need to be observed during debug. A designer cannot know for certain which parts of the chip will have design flaws and will be the subject of an intense debug effort. Chip designers would prefer to have as many internal nodes as possible available for sampling during debug testing. However, the additional logic and wiring needed to allow sampling of many internal nodes can be expensive.
Local history buffers may be added to key internal buses inside the chip. These buffers allow for traces of bus cycles to be captured and later read out of the chip during debug testing. However, these buffers can be expensive when many buffers or deep buffers are used. Control of these buffers can be difficult as there may be no internal triggering mechanism.
FIG. 1 shows a multi-processor system chip. Processor cores 10, 110′, 10″ are integrated together onto IC chip 20. Each processor core may execute a separate stream of instructions and each accesses its own local cache memory, caches 12, 12′, 12″. When data is not found in the local cache memory (a cache miss), memory controllers 14, 14′, 14″ fetch the desired data from an external memory, such as using an external bus to a large external main memory.
Snoop tags 16 contain directory information about the entries currently being stored in caches 12, 12′, 12″. Cache coherency is achieved through the use of snoop tags 16, perhaps in conjunction with external directories and other controllers.
Self-test logic and test controllers may also be integrated onto a very-large-scale-integration (VLSI) chip. Test controller 18 may be included on IC chip 20. Test controller 18 may be activated by a combination or sequence of signals on external pins that activates a test mode.
FIG. 2 shows prior-art test scan chains in a large chip. Test scan chains are often inserted into chips to aid automated testing. Special chip-design software can replace ordinary D-type flip-flops with testable or scan flip-flops 30 that have two D inputs and 2 clocks. The extra clock inputs to scan flip-flops 30 are driven by test clock TCK, which can be applied to an external pin of the chip and may be buffered or gated (not shown). The normal clocks are stopped during test mode and TCK is pulsed to scan in and out data along the scan chains. The extra D inputs to scan flip-flops 30 are connected to Q outputs of other scan flip-flops 30 to form a scan chain along scan flip-flops 30.
The first scan flip-flops 30 in the scan chain has a second D input that receives a test-input TI from an external pin, while the last Q output from the last scan flip-flop of the chain of scan flip-flops 30 drives a test output TO that can be read by an external tester and compared to expected data by the external tester.
When a large chip has multiple CPU blocks 22, 22′, 22″, the Q output of the last scan flip-flop 30 in one CPU block 22 can drive the D test input of the first scan flip-flop 30 in second CPU block 22′. Likewise, the Q output of the last scan flip-flop 30 in second CPU block 22′ can drive the D test input of the first scan flip-flop 30 in third CPU block 22″. Thus test scan chains of scan flip-flops 30 in CPU blocks 22, 22′, 22″ may be chained together into one long scan chain.
While useful, the length of the long scan chain of scan flip-flops 30 through many CPU blocks 22, 22′, 22″ can be excessively long, requiring many pulses of test clock TCK to scan data in and out. Testing may be inefficient, increasing test times and test costs. Isolating test failures to particular CPU blocks may be quite difficult since the scan chains from different blocks are strung together into one long scan chain. The tester log file may have to be examined to determine which of CPU blocks 22, 22′, 22″ caused the test failure.
Scan chains do allow for some internal node to be observed. However, a lengthy scan-out sequence is needed to scan a sampled node out through the many flip-flops 30 in a scan chain. Determining the exact time of sampling may be difficult. Also, there is no provision for triggering or initiating sampling on an event such as a certain value of internal nodes. Instead, internal nodes have to be scanned out and then compared to the trigger value. The scan-out process itself can over-write internal nodes, causing internal nodes to change values. Thus capturing a sequence of internal states instead of just a one-time sample is difficult or impossible using scan chains.
FIG. 3 show a multi-processor chip with local history buffers. Debug buffer 24 can capture signals such as address and data on a bus between processor core 10 and cache 12. Debug buffer 26 can capture signals such as address and data on another bus between cache 12 and memory controller 14. Debug buffer 28 can capture address, data, or other information on a snoop bus that connects memory controllers 14, 14′ and snoop tags 16.
Likewise, debug buffers 24′, 26′ capture internal bus signals between processor core 10′, cache 12′, and memory controller 14′. These internal buses may observed by triggering debug buffers 24, 24′, 26, 26′, 28 to begin to write data from their bus, and then halting the processor clock or otherwise ending writing of data into the debug buffers. The debug buffers are then read by some mechanism.
While useful, the size of debug buffers 24, 24′, 26, 26′, 28 may be limited, allowing only a few samples to be captured. When debug buffers 24, 24′, 26, 26′, 28 are made larger, the overall size and cost of these buffers can increase significantly, especially when many processors and caches are integrated onto the same chip. The use of debug buffer memory is inefficient, since only one bus may need to be watched, yet many debug buffers are provided since it is not known in advance which bus will need to be watched. Also, there may be no provision for triggering on an internal event or bus value.
What is desired is an on-chip logic analyzer for debug testing of a large chip. An on-chip logic analyzer that can observe many internal nodes from many different blocks is desired. An on-chip logic analyzer that can trigger sampling and storage of these nodes using internal bus states is desirable. An on-chip logic analyzer that does not significantly degrade performance of the chip, yet is able to sample signals at a high rate is further desired.