1. Technical Field
The present application relates generally to an improved data processing system and method. More specifically, the present application is directed to a system and method for streaming high frequency trace data off chip.
2. Description of Related Art
Chip debugging practices rely heavily on capturing signal state transitions in on-chip arrays, referred to as trace arrays, to understand the at-speed behavior of internal processor cores, bus interfaces, and various other components within a chip. Debug information captured in trace arrays can be used to identify logic design errors, timing failures, and performance bottlenecks. Runtime visibility of a large number of signals over an extended period of time enables a quick diagnosis of elusive problems. Successful trace systems provide significant signal visibility without incurring excessive costs of area, power, and complexity.
Typical on-chip trace architectures contain trace arrays that store data, multiplexer networks that select data, and control systems that control the manner in which data are captured. Such an architecture provides the ability to record a partial snapshot of chip behavior around the time of a failure to detect offending logic that does not function according to the needs of the system. A drawback of this architecture is the limited size of the sampling window due to constrained memory and chip input/output (I/O) resources.
Tradeoffs of cost, area, and power consumption may result in a design with relatively small trace arrays, which may not be sufficient for complete internal visibility into the design. Some solutions provide additional trace depth, such as sending trace data to main memory or routing trace data through chip output pins to an external storage device. Trace systems that use main memory for trace data storage exhibit significantly larger storage capacity; however, these systems are severely limited by multiple factors. The true trace memory depth is variably determined at runtime by how much main memory can be allocated to trace for a certain scenario. Dedicated external trace hardware, such as a logic analyzer, typically supports far greater capacities without compromising available resources on chip.
Another limitation of using main memory storage is the inherent interference experienced when trace data is recorded. The trace engine and system components, such as the processor bus interface, share a common data path to the memory system. Therefore, to record trace data, the processor must be stalled or the trace engine must steal available bus cycles. Both may alter the state of the system and, thus, corrupt the device under test (DUT).
Although chip output pins provide a conduit that would appear to solve the capacity problem, systems have not used this option to capture wide busses of at-speed data without discarding data. Lossless tracing through the chip output pins is not possible without some processing mechanism, because the functions that are being traced typically operate at clock frequencies much higher than can be supported by chip output pins. Common tracing solutions that rely on debug output pins discard samples of data and/or encode the data in a way that selectively discards information within a trace so that the data rate is sufficiently reduced to support the output pin bandwidth. These solutions do not provide lossless mechanisms to record and recover the complete trace.
One known solution for expanding the effective trace capture window using on-chip arrays is data compression, commonly using a lossless algorithm like run-length encoding (RLE). The main idea behind compression algorithms is a reduction in the amount of data stored in the array. For instance with RLE, this is done by only storing unique patterns to memory and a repeat count to indicate a number of consecutive duplicate samples. While this expands the ability of a trace array to capture more data over a larger period of time, it is still limited by the compressibility of the trace data and the array capacity when the data is exclusively stored in on-chip arrays.